--- title: "Statistical estimation strategies" author: "Hans Ttito" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Statistical estimation strategies} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, message = FALSE, warning = FALSE ) library(fb4package) ``` ## Overview `run_fb4()` supports four strategies for estimating the proportion of maximum consumption (*p*): | Strategy | Input required | Output | |---|---|---| | `"binary_search"` | target final weight | point estimate of *p* | | `"mle"` | observed final weights | *p* + SE + 95 % CI | | `"bootstrap"` | observed final weights | *p* + SD + bootstrap CI | | `"hierarchical"` | individual mark-recapture data | population mean and SD of *p* | Every strategy takes a `Bioenergetic` object as input and returns an `fb4_result` object with the same structure. --- ## Shared setup The Bioenergetic object below is used by all four strategies. It represents an adult Chinook salmon cohort (*Oncorhynchus tshawytscha*) over a 100-day growing season. ```{r bio-setup} data(fish4_parameters) chinook_db <- fish4_parameters[["Oncorhynchus tshawytscha"]] stage <- names(chinook_db$life_stages)[1] sp_params <- chinook_db$life_stages[[stage]] sp_info <- chinook_db$species_info days <- 1:100 temp_data <- data.frame( Day = days, Temperature = round(10 + 5 * sin(2 * pi * (days - 60) / 365), 2) ) diet_data <- list( proportions = data.frame(Day = days, Alewife = 0.7, Shrimp = 0.3), prey_names = c("Alewife", "Shrimp"), energies = data.frame(Day = days, Alewife = 4900, Shrimp = 3200) ) bio <- Bioenergetic( species_params = sp_params, species_info = sp_info, environmental_data = list(temperature = temp_data), diet_data = diet_data, simulation_settings = list(initial_weight = 1800, duration = 100) ) bio$species_params$predator$ED_ini <- 4500 bio$species_params$predator$ED_end <- 5500 ``` Observed final weights simulated around a target of 3 000 g are used by the likelihood and bootstrap strategies: ```{r obs-weights} set.seed(42) obs_weights <- rnorm(30, mean = 3000, sd = 80) ``` --- ## Strategy 1 — Binary search Binary search finds the *p* that produces a specified target weight. Use it when a point estimate is enough and you have a known growth target. ```{r binary-search} res_bs <- run_fb4(bio, strategy = "binary_search", fit_to = "Weight", fit_value = 3000) cat("p estimate :", round(res_bs$summary$p_value, 4), "\n") cat("Final weight:", round(res_bs$summary$final_weight, 1), "g\n") ``` --- ## Strategy 2 — Maximum Likelihood Estimation MLE finds the *p* that maximises the likelihood of observing the supplied weights, assuming a log-normal distribution around the predicted weight. It returns a standard error and a 95 % confidence interval via the profile likelihood. ```{r mle, cache=TRUE} res_mle <- run_fb4(bio, strategy = "mle", fit_to = "Weight", observed_weights = obs_weights) cat("p estimate :", round(res_mle$summary$p_value, 4), "\n") cat("SE :", round(res_mle$summary$p_se, 4), "\n") cat("Converged :", res_mle$summary$converged, "\n") ``` --- ## Strategy 3 — Bootstrap Bootstrap resamples the observed weights and re-fits the model at each replicate, producing a distribution of *p* estimates without assuming a particular form for the weight distribution. ```{r bootstrap, cache=TRUE} res_boot <- run_fb4(bio, strategy = "bootstrap", fit_to = "Weight", observed_weights = obs_weights, n_bootstrap = 100, upper = 1, parallel = FALSE) cat("p mean :", round(res_boot$summary$p_mean, 4), "\n") cat("p SD :", round(res_boot$summary$p_sd, 4), "\n") ``` --- ## Strategy 4 — Hierarchical model (mark-recapture) ### Biological context The hierarchical strategy is designed for **mark-recapture bioenergetics**: individual fish are tagged, weighed at release (initial weight), and re-weighed at recapture (final weight) after a fixed monitoring period. The model estimates a *p* value for each individual, then pools those estimates into a **population-level distribution** of feeding levels. This answers two biological questions: - What is the *average* feeding level (*p*) of the population under the observed environmental conditions? - How *variable* is individual feeding behaviour within that population? ### Simulating a mark-recapture data set First we use binary search to find the *p* that is consistent with this `bio` object (initial weight 1 800 g, 100 days). That value becomes the centre of the simulated population distribution. ```{r mr-ref-p} res_ref <- run_fb4(bio, strategy = "binary_search", fit_to = "Weight", fit_value = 2500) ref_p <- res_ref$summary$p_value cat("Reference p (target 2 500 g):", round(ref_p, 4), "\n") ``` ```{r mr-simulate} set.seed(123) n_fish <- 20 # Cohort: individual initial weights varying ± 15 % around 1 800 g initial_weights <- round(runif(n_fish, min = 1800 * 0.85, max = 1800 * 1.15)) # Each fish has its own feeding level drawn from a population distribution # centred on the reference p estimated above (sigma_p = 0.06) true_p <- pmax(0.3, pmin(1.5, rnorm(n_fish, mean = ref_p, sd = 0.06))) # Simulate final weights: run individual direct simulations + measurement noise final_weights <- vapply(seq_len(n_fish), function(i) { bio_i <- bio bio_i$simulation_settings$initial_weight <- initial_weights[i] res_i <- run_fb4(bio_i, strategy = "direct", p_value = true_p[i]) round(res_i$summary$final_weight * rnorm(1, mean = 1, sd = 0.02), 1) }, numeric(1)) mr_data <- data.frame( individual_id = sprintf("fish_%02d", seq_len(n_fish)), initial_weight = initial_weights, final_weight = final_weights ) head(mr_data, 6) ``` ### Running the hierarchical model ```{r hierarchical, cache=TRUE, dependson=c("mr-ref-p", "mr-simulate")} res_hier <- run_fb4( x = bio, strategy = "hierarchical", backend = "tmb", fit_to = "Weight", observed_weights = mr_data ) cat(sprintf("Population mean p (mu_p) : %.4f\n", res_hier$summary$mu_p_estimate)) cat(sprintf("Population SD p (sigma_p) : %.4f\n", res_hier$summary$sigma_p_estimate)) cat(sprintf("Number of individuals : %d\n", res_hier$summary$n_individuals)) cat(sprintf("Converged : %s\n", res_hier$summary$converged)) ``` The data were generated with a theoretical population mean of *p* = `r round(ref_p, 3)` (the binary-search estimate for a 2 500 g target) and SD = 0.06. Because only `r n_fish` fish were drawn, the realised sample mean is `r round(mean(true_p), 3)` — this is the value `mu_p` should recover. `sigma_p` reflects both the true individual variation (SD = 0.06) and the 2 % measurement noise added to the final weights. ### Assumptions and limitations For `mu_p` and `sigma_p` to be biologically meaningful, the study design must satisfy a few conditions. All fish in the data set must belong to the same species and life stage and must have experienced the same environmental conditions (temperature, diet). If you mix life stages or fish from different habitats, `mu_p` becomes a statistical average with no clear ecological interpretation. The model also assumes a fixed monitoring period: every individual is tracked for the same `n_days`. Variable recapture intervals are not currently supported. On the statistical side, individual *p* values are modelled on the log scale, so the implied population distribution is approximately log-normal. This guarantees *p* > 0 and works well when individual variation is moderate. The model accepts a covariate matrix (`covariates_matrix`) if you want to explain part of the variation in *p* by body size, sex, or tagging location; without covariates, all unexplained variation accumulates in `sigma_p`. Finally, the hierarchical model requires `backend = "tmb"` — automatic differentiation via TMB is needed for the mixed-effects likelihood. The pure-R backend does not support this strategy. --- ## Comparing strategies ### Strategies 1–3: single-population estimates Binary search, MLE, and bootstrap all answer the same question — *what p produces the observed growth of a representative fish?* — and can therefore be compared numerically. All three use the same `bio` object (`initial_weight = 1800 g`, target or observed weights ≈ 3 000 g). ```{r compare-123, echo=FALSE} comp123 <- data.frame( Strategy = c("binary_search", "mle", "bootstrap"), p_estimate = round(c(res_bs$summary$p_value, res_mle$summary$p_value, res_boot$summary$p_mean), 4), Uncertainty = round(c(NA, res_mle$summary$p_se, res_boot$summary$p_sd), 4), Type = c("none (point estimate)", "SE (asymptotic)", "SD (non-parametric)"), stringsAsFactors = FALSE ) knitr::kable(comp123, col.names = c("Strategy", "p estimate", "SE / SD", "Uncertainty type"), align = c("l", "r", "r", "l"), caption = "Strategies 1–3 fitted to the same data (Chinook salmon, 100-day simulation, target weight ≈ 3 000 g).") ``` ### Strategy 4: population-level distribution The hierarchical model answers a **different question**: not the *p* of a representative individual, but the *distribution* of *p* across a population of tagged fish. Its output (`mu_p`, `sigma_p`) is not numerically comparable to the single estimates above because the input data differ (each fish has its own initial and final weight) and the parameter being estimated is a population mean, not an individual feeding level. ```{r compare-hier, echo=FALSE} comp_hier <- data.frame( Parameter = c("mu_p (population mean)", "sigma_p (population SD)", "n_individuals", "converged"), Value = c(round(res_hier$summary$mu_p_estimate, 4), round(res_hier$summary$sigma_p_estimate, 4), res_hier$summary$n_individuals, as.character(res_hier$summary$converged)), stringsAsFactors = FALSE ) knitr::kable(comp_hier, col.names = c("Parameter", "Estimated value"), align = c("l", "r"), caption = paste("Hierarchical model results for the simulated", "mark-recapture cohort (n =", res_hier$summary$n_individuals, "fish, true mu_p =", round(ref_p, 3), ").")) ``` --- ## References Deslauriers D, Chipps SR, Breck JE, Rice JA, Madenjian CP (2017). Fish Bioenergetics 4.0: An R-Based Modeling Application. *Fisheries* 42(11):586–596. Chipps SR, Wahl DH (2008). Bioenergetics modeling in the 21st century: reviewing new insights and revisiting old constraints. *Transactions of the American Fisheries Society* 137(1):298–313.