--- title: "Getting Started with smoothbp" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with smoothbp} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = NOT_CRAN) ``` ## Overview `smoothbp` fits hierarchical piecewise regression models with **multiple change-points**, each smoothed by a logistic sigmoid. The general mean function for $K$ change-points is: $$ \mu_i = b0_i + b1_i(\tau_i - \omega_{1i}) + \sum_{k=1}^{K} \delta_{ki}(\tau_i - \omega_{ki})\, \sigma\!\left[(\tau_i - \omega_{ki})\rho_{ki}\right] $$ where | Symbol | Role | |--------|------| | $\tau$ | time / covariate | | $\omega_k$ | location of the $k$-th change-point | | $\rho_k$ | transition sharpness for change-point $k$ ($\rho \to \infty$ → hard kink) | | $b0$ | level at the first change-point | | $b1$ | pre-change slope (anchored at $\omega_1$) | | $\delta_k$ | slope change attributable to change-point $k$ | When $K = 0$ the model reduces to a standard hierarchical linear regression. When $K = 1$ it matches the classic two-phase piecewise model. Each parameter has its own linear predictor specified via a formula. `b0` may also include a random intercept `(1 | group)` for clustered / repeated-measures data. Posterior inference uses a **Metropolis-within-Gibbs sampler written in Rust**: - $b0$, $b1$, $\delta_k$, random intercepts — exact Gibbs (conjugate normal). - $\omega_k$, $\rho_k$ — Hamiltonian Monte Carlo with adaptive mass matrix during warmup. - $\sigma$, $\sigma_u$ — exact Gibbs (inverse-gamma conjugate). - $\gamma_k$ (spike-and-slab) — Gibbs with Bernoulli likelihood. --- ## Real-World Applications While `smoothbp` is a general-purpose statistical tool, it is uniquely powerful for modeling structural changes across various scientific domains: - **Pharmacokinetics / Medicine**: Modeling the delayed onset of a drug's effect and identifying the precise time threshold where clearance rates exponentially shift. - **Ecology / Climate Science**: Identifying "tipping points" where species population growth trajectories abruptly change in response to accumulated temperature variations. - **Epidemiology**: Tracking structural shifts in infection rates ($R_0$) to pinpoint exactly when a public health intervention successfully flattened the curve. - **Finance**: Discovering the exact timing of market regime shifts and measuring how individual assets respond differently to global economic shocks. --- ## Simulating data `simulate_smoothbp()` generates synthetic data from the model. It is useful for testing and for understanding how the parameters relate to the observable trajectory shape. ```{r simulate} library(smoothbp) dat <- simulate_smoothbp( n_subj = 20, n_obs = 8, b0 = 5.0, # level at change-point b1 = -0.3, # pre-change slope b2 = 1.2, # slope change (delta_1) omega = 3.0, # change-point location rho = 4.0, # transition sharpness sigma = 0.4, # residual SD sigma_u = 0.5, # between-subject SD tau_range = c(0, 6), seed = 42 ) head(dat) true_params(dat) ``` --- ## Fitting the model ### Zero change-points (linear fallback) When `deltas`, `omega`, and `rho` are empty lists, `smoothbp` fits a standard hierarchical linear regression: ```{r fit-zero-bp} fit0 <- smoothbp( formula = y ~ tau, b0 = ~ 1 + (1 | subject), b1 = ~ 1, deltas = list(), # no change-points omega = list(), rho = list(), data = dat, chains = 2L, iter = 1000L, warmup = 500L, seed = 42L, .verbose = FALSE ) summary(fit0) ``` ### One change-point ```{r fit-one-bp} fit1 <- smoothbp( formula = y ~ tau, b0 = ~ 1 + (1 | subject), b1 = ~ 1, deltas = list(~ 1), # one slope change omega = list(~ 1), # one change-point location rho = list(~ 1), # one sharpness data = dat, priors = smoothbp_priors( omega = list(prior_normal(3, 2, lb = 0)) ), chains = 4L, iter = 2000L, warmup = 1000L, seed = 42L ) summary(fit1) ``` ### Multiple change-points (The List-of-Formulas API) For models with multiple structural breaks, `smoothbp` uses a "List-of-Formulas" API. You pass lists of equal length to `deltas`, `omega`, and `rho`. Each element in the list corresponds to the sub-model for that specific breakpoint. ```{r fit-multi-bp} # Fit a model with 3 candidate breakpoints fit3 <- smoothbp( formula = y ~ tau, b0 = ~ 1 + (1 | subject), b1 = ~ 1, # Each segment gets its own formula (here, just an intercept) deltas = list(~ 1, ~ 1, ~ 1), omega = list(~ 1, ~ 1, ~ 1), rho = list(~ 1, ~ 1, ~ 1), data = dat, priors = smoothbp_priors( # Use the space_omega_priors helper to initialize the search omega = space_omega_priors(K = 3, tau_min = 0, tau_max = 6) ) ) ``` ### Known Change-Points with `fixed()` If you are performing an intervention analysis (e.g., a Regression Discontinuity Design or a Stepped-Wedge trial), you may already know exactly where the change points should be. By using the `fixed()` helper, you tell the model **not** to estimate the location or sharpness, but only the magnitude of the change ($\delta$). This is much faster and allows for direct hypothesis testing of the intervention effect. ```{r fit-fixed} # Test for a hard kink at exactly tau = 3.0 fit_fixed <- smoothbp_ss( formula = y ~ tau, omega = list(fixed(3.0)), # Location is known rho = list(fixed(100)), # Sharpness is fixed (hard kink) data = dat ) # The PIP tells us the probability that the intervention had an effect pip(fit_fixed) ``` > **Note**: `fixed()` can also accept a vector of the same length as your data, allowing for observation-specific change points (e.g., different intervention times per cluster). > **Tip**: If you are unsure how many change-points are present, use > `smoothbp_ss()` with spike-and-slab regularization — see > `vignette("spike-and-slab")`. ### Progress and parallel chains ```{r fit-parallel} fit1 <- smoothbp( formula = y ~ tau, b0 = ~ 1 + (1 | subject), data = dat, chains = 4L, iter = 2000L, warmup = 1000L, seed = 42L, cores = 4L # run all 4 chains concurrently ) ``` To make parallel the default for a project, add this to `.Rprofile`: ```{r rprofile, eval=FALSE} options(smoothbp.cores = parallel::detectCores()) ``` --- ## Posterior summary `print()` displays results in distinct, labelled sections; `summary()` returns a clean data frame. By default, `effects = c("fixed", "ran_pars")` is used, matching the conventions of other mixed-modeling packages like `brms`. You can filter which parameters are returned or printed by passing a vector to the `effects` argument. The parameters are classified into three strict categories: 1. `"fixed"`: Population-level structural parameters ($b0, b1, \delta, \omega, \rho, \sigma$) 2. `"ran_pars"`: Group-level variance parameters (e.g., `sigma_u`) 3. `"ran_vals"`: The actual group-level deviations (e.g., the individual `u[j]` random intercepts) ```{r summary} print(fit1) # fixed + ran_pars (default) print(fit1, effects = "fixed") # population-level only summary(fit1, effects = "all") # returns everything, including u[j] ``` Parameter names follow the convention: | Name | Meaning | |------|---------| | `b0_(Intercept)` | Intercept (level at first change-point) | | `b1_(Intercept)` | Pre-change slope | | `delta1_(Intercept)` | Slope change at change-point 1 | | `omega1_(Intercept)` | Change-point 1 location | | `rho1_(Intercept)` | Change-point 1 sharpness | | `sigma` | Residual SD | | `sigma_u` | Between-group SD (random intercept) | --- ## Diagnostics ### Trace plots with mixing indicators ```{r trace} plot(fit1) # alias for trace_plot(fit1) trace_plot(fit1, type = "both") # trace + density ``` Parameters with $\hat{R} > 1.05$ or bulk-ESS $< 100$ are flagged with a ⚠ symbol and a red-tinted background. Thresholds are adjustable: ```{r trace-strict} trace_plot(fit1, rhat_thresh = 1.01, ess_thresh = 400) ``` ### Posterior predictive check ```{r pp-check} pp_check(fit1) ``` --- ## Prior specification `smoothbp_priors()` accepts either a single prior applied to all coefficients of a component, or a named list. For multi-breakpoint models, `omega` and `rho` accept a **list of priors** (one per breakpoint): ```{r priors} smoothbp_priors( b0 = prior_normal(0, 10), b1 = prior_normal(0, 5), omega = list( prior_normal(2, 1), # change-point 1 prior_normal(5, 1) # change-point 2 ), rho = list( prior_normal(4, 2, lb = 0), prior_normal(4, 2, lb = 0) ), sigma = prior_invgamma(shape = 2, scale = 1) ) ``` The `lb` and `ub` arguments impose hard bounds. `rho` should typically have `lb = 0` (sharpness must be positive). For `omega` and `deltas`, it is often better to **avoid hard bounds** unless they are physically necessary for your domain. > **Performance Tip**: `smoothbp` is significantly faster when `b1`, `deltas`, and `omega` are unconstrained. Without bounds, the sampler can use exact conjugate Gibbs updates for `b1` and `deltas`, and standard HMC for `omega`. Truncated priors require additional checks and rejection sampling steps. If you do use bounds for `omega`, ensure they cover the actual range of your time variable (e.g. if your time starts at -10, do not use `lb = 0`). --- ## Breakpoint selection with spike-and-slab When the number of change-points is unknown, fit a model with $K$ potential breakpoints using `smoothbp_ss()`. Each slope change $\delta_k$ has a binary inclusion indicator $\gamma_k$; the posterior mean of $\gamma_k$ is the **posterior inclusion probability (PIP)** for that breakpoint. ```{r ss-overview} fit_ss <- smoothbp_ss( formula = y ~ tau, b0 = ~ 1 + (1 | subject), b1 = ~ 1, deltas = list(~ 1, ~ 1, ~ 1), # 3 candidate breakpoints omega = list(~ 1, ~ 1, ~ 1), rho = list(~ 1, ~ 1, ~ 1), data = dat, spike = prior_spike_slab(pi = 0.1, learn_pi = TRUE), chains = 4L, iter = 2000L, warmup = 1000L, seed = 42L ) # Posterior inclusion probabilities per breakpoint pip(fit_ss) #> gamma_delta1_(Intercept) gamma_delta2_(Intercept) gamma_delta3_(Intercept) #> 0.987 0.063 0.041 ``` A PIP near 1 indicates that breakpoint is supported by the data; a low PIP indicates the slope change is not needed. See `vignette("spike-and-slab")` for a full tutorial. --- ## Hypothesis tests and evidence ratios `hypothesis()` evaluates directional hypotheses against the posterior draws. ```{r hypothesis-directional} # Is the slope change at breakpoint 1 positive? smoothbp::hypothesis(fit1, "delta1_(Intercept) > 0") # Complex linear hypothesis: is the final slope (b1 + delta1) negative? smoothbp::hypothesis(fit1, "b1_(Intercept) + delta1_(Intercept) < 0") # Does the change-point fall before time 4? smoothbp::hypothesis(fit1, "omega1_(Intercept) < 4") ``` | Stars | ER | $P(H)$ | |-------|----|--------| | `***` | > 99 | > 0.99 | | `**` | > 19 | > 0.95 | | `*` | > 3 | > 0.75 | --- ## Model comparison ### Leave-one-out cross-validation (LOO-IC) ```{r loo} # Compare 0-BP (linear) vs 1-BP vs 3-BP models loo0 <- loo(fit0) loo1 <- loo(fit1) loo3 <- loo(fit3) loo::loo_compare(loo0, loo1, loo3) ``` ### Bayes factors via bridge sampling ```{r bridge-sampler} library(bridgesampling) bs0 <- bridge_sampler(fit0) bs1 <- bridge_sampler(fit1) bayes_factor(bs1, bs0) ``` --- ## Extracting results ### Fitted values ```{r fitted-training} # Posterior mean + 95% CI at training observations fitted(fit1) # Full posterior draws matrix (n_draws × n_obs) draws_mat <- fitted(fit1, summary = FALSE) ``` ### Working with `posterior` draws The `$draws` component of a `smoothbp` fit is a standard `draws_array` object from the `posterior` package. You can use any of the `posterior` tools to manipulate and summarize the draws. ```{r draws-manip} library(posterior) # Convert to a data frame for use with ggplot2 draws_df <- as_draws_df(fit1$draws) # Extract a specific parameter b1_draws <- draws_df$`b1_(Intercept)` # Compute custom summaries summarise_draws(fit1$draws, "mean", "sd", ~quantile(.x, c(0.1, 0.9))) ``` ### Marginal predictions ```{r fitted-marginal} # Population-level predictions at new time points newdata_marginal <- data.frame(tau = seq(0, 6, by = 0.1)) fitted(fit1, newdata = newdata_marginal) ``` ### Conditional predictions ```{r fitted-conditional} # Subject-specific predictions fitted(fit1, newdata = data.frame(tau = seq(0, 6, by = 0.1), subject = "1")) ```