--- title: "Inference & Diagnostics" subtitle: "Exploratory screening, bootstrap confidence intervals, and sensitivity analysis" format: html: code-fold: false toc: true toc-depth: 3 number-sections: true vignette: > %\VignetteIndexEntry{Inference & Diagnostics} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} --- ```{r} #| include: false has_pkg <- requireNamespace("TemporalHazard", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = has_pkg ) ``` ```{r} #| label: setup library(TemporalHazard) library(survival) library(ggplot2) ``` Fitting a hazard model is the middle of the workflow, not the end. Before you fit, you screen the covariates so you know what transformations the data actually supports. After you fit, you quantify uncertainty (bootstrap CIs, Wald intervals), check whether the model is honest to the data (goodness-of-fit overlays, decile-of-risk calibration), and probe its behavior under different covariate scenarios (sensitivity analysis). This vignette walks the diagnostic and inferential tools the package ships for each of those steps. These pieces correspond directly to steps in the classic SAS HAZARD workflow — the `lg.*` logistic-screening macros, the `bs.*` bootstrap macros, the `hs.*` hazard-sensitivity macros — re-implemented as first-class R functions (`hzr_calibrate()`, `hzr_kaplan()`, `hzr_nelson()`, `hzr_bootstrap()`, `hzr_gof()`, `hzr_deciles()`). If you've worked through a SAS HAZARD analysis you already know the shape of what follows; if not, treat this vignette as the standard diagnostic pass you'd run on any fitted hazard model before trusting its conclusions. ## Exploratory covariate screening Before fitting the parametric hazard model, screen each candidate covariate with a simple logistic regression against the event indicator. This is much cheaper than fitting the full hazard model and answers two distinct questions: which covariates carry signal at all (rough significance triage), and what *functional form* each covariate enters with — linear, log, polynomial, or something non-monotone that needs binning. The screening step is what stops you from blindly throwing every column at `hazard()` and then trying to debug a model that has shape mis-specification baked in. ```{r} #| label: explore-data data(avc) avc <- na.omit(avc) # Candidate covariates candidates <- c("age", "status", "mal", "com_iv", "inc_surg", "orifice") ``` Fit a univariable logistic regression for each candidate against the death indicator. The p-values that come out are not the final word on whether a covariate matters — they ignore the joint structure of the hazard model and they treat the event time as a binary outcome — but they do tell you which covariates have no chance of mattering, and which deserve closer inspection. ```{r} #| label: explore-logit logit_fits <- lapply(candidates, function(var) { fmla <- as.formula(paste("dead ~", var)) glm(fmla, data = avc, family = binomial()) }) names(logit_fits) <- candidates # Which covariates are significant univariably? p_values <- vapply(logit_fits, function(f) { coef(summary(f))[2, "Pr(>|z|)"] }, numeric(1)) data.frame(covariate = names(p_values), p_value = round(p_values, 4), row.names = NULL) ``` ```{r} #| label: fig-explore-age #| fig-cap: "Exploratory: age vs. mortality with LOESS smooth" #| fig-width: 7 #| fig-height: 4 ggplot(avc, aes(age, dead)) + geom_jitter(height = 0.05, width = 0, alpha = 0.3, size = 1) + geom_smooth(method = "loess", se = TRUE, colour = "#0072B2", linewidth = 0.8) + labs(x = "Age at operation (months)", y = "Death (0/1)") + theme_minimal() ``` The LOESS smooth reveals the *shape* of the covariate-mortality relationship in a non-parametric way. A roughly linear smooth means the covariate enters the hazard model on its natural scale; a U or inverted-U shape means a polynomial or a non-monotone transform; a plateau-then-rise means a threshold effect that may need binning. This is the call you make *before* writing the formula for `hazard()`, not after staring at a converged-but-misspecified model. ### Quantile calibration with `hzr_calibrate()` `hzr_calibrate()` bins a continuous covariate into quantile groups and applies the logit / Gompertz / Cox link transform, so you can eyeball whether the covariate-outcome relationship is linear on the chosen link scale. It corresponds to the SAS `logit.sas` / `logitgr.sas` macros. ```{r} #| label: calibrate-age age_cal <- hzr_calibrate( x = avc$age, event = avc$dead, groups = 10, link = "logit" ) print(age_cal) ``` A roughly linear trend across groups means the logit link is appropriate; strong curvature flags the need for a transform (for example, polynomial or log). ## Nonparametric baselines A parametric fit has to be compared against *something*. The natural something is the nonparametric estimate of the same quantity from the same data — Kaplan-Meier for survival, Nelson-Aalen for cumulative hazard. The package's `hzr_kaplan()` and `hzr_nelson()` are convenience wrappers that return richer output than `survival::survfit()`: `hzr_kaplan()` ships KM with logit-transformed confidence limits (more accurate in the tails than the standard Greenwood limits, which can stray outside $[0, 1]$), interval hazard rates, density estimates, and a restricted-mean-survival-time scalar. `hzr_nelson()` returns the Wayne Nelson cumulative hazard estimator with log-normal confidence limits — the right reference when you care about the integrated intensity rather than the survival probability. ```{r} #| label: km-baseline km <- hzr_kaplan(time = avc$int_dead, status = avc$dead, conf_level = 0.95) print(km) ``` ```{r} #| label: nelson-baseline nel <- hzr_nelson(time = avc$int_dead, event = avc$dead, conf_level = 0.95) print(nel) ``` ## Bootstrap confidence intervals The delta-method CIs `predict(se.fit = TRUE)` returns are fast and asymptotically correct, but they lean on the Hessian-derived parameter vcov — which can mislead when the likelihood surface is poorly behaved near the MLE (boundary fits, near-degenerate parameter combinations, small samples). Bootstrap resampling is the robust alternative: refit the model on each resampled dataset, collect the resulting parameter and prediction values, and read the empirical 2.5th / 97.5th percentiles as the 95% CI. It's computationally heavier but it doesn't assume the likelihood is locally quadratic, and it picks up the right tail behavior even when delta-method theory creaks. `hzr_bootstrap()` wraps the resample-and-refit loop and corresponds to the SAS `bootstrap.hazard.sas` and `bootstrap.summary.sas` macros. We fit a base model first; bootstrap replicates will resample from the same data and refit using these starting values. ```{r} #| label: fit-base # Fit the base model that bootstrapping will use fit <- hazard( Surv(int_dead, dead) ~ age + status + mal + com_iv, data = avc, dist = "weibull", theta = c(mu = 0.20, nu = 1.0, rep(0, 4)), fit = TRUE, control = list(maxit = 500) ) # Prediction grid at a median-risk profile t_grid <- seq(0.01, max(avc$int_dead) * 0.9, length.out = 100) base_nd <- data.frame(time = t_grid, age = median(avc$age), status = 2, mal = 0, com_iv = 0) surv_point <- predict(fit, newdata = base_nd, type = "survival") ``` `hzr_bootstrap()` returns a long data frame of per-replicate coefficient estimates (`$replicates`) and a parameter-level summary (mean, SD, 95% percentile CI) in `$summary`. We use `n_boot = 30` here purely so the vignette renders in reasonable time — for a real analysis you'd want 200 or more replicates. The `set.seed(42)` call before `hzr_bootstrap()` makes the resample sequence reproducible. ```{r} #| label: bootstrap set.seed(42) boot <- hzr_bootstrap(fit, n_boot = 30) # kept small for vignette build time print(boot) ``` The parameter-level CIs above come directly from the bootstrap distribution. Prediction-level CIs (on the survival curve or hazard profile) require pivoting `boot$replicates` to wide format and evaluating the parametric survival formula once per replicate. That workflow is out of scope for this section, but the building blocks are all in the returned object. ## Goodness-of-fit overlay The Kaplan-Meier overlay in the prediction-visualization vignette is the *visual* goodness-of-fit check. `hzr_gof()` is the *quantitative* complement: it tabulates parametric predictions side by side with the nonparametric KM estimate at each event time, and computes the Conservation-of-Events observed-vs-expected ratio across the whole follow-up window. The ratio compresses everything the model is doing into a single number, which makes it the right summary statistic to report alongside coefficient tables. (`hzr_gof()` corresponds to the SAS `hazplot.sas` macro.) ```{r} #| label: gof-summary gof <- hzr_gof(fit) print(gof) ``` The observed vs. expected ratio (printed at the bottom) is the Conservation-of-Events check: a well-specified model recovers the event count exactly, so a ratio close to 1 is what we want. Ratios well above or below 1 flag either a misspecified shape or a covariate that isn't really linear on the log-hazard scale. ## Decile-of-risk calibration The conservation-of-events ratio above tells you whether the model is calibrated *on average*. That's necessary but not sufficient — a model can hit the right total event count while systematically over-predicting risk for one part of the cohort and under-predicting for another, with the errors cancelling in aggregate. `hzr_deciles()` is the check that catches that failure mode: it partitions patients into deciles of predicted risk at a chosen time point and compares observed with expected event counts within each decile, returning a chi-square goodness-of-fit test. A well-calibrated model has observed-vs-expected close in every decile; a model that's globally calibrated but locally biased will show large residuals in the extreme deciles. (Corresponds to the SAS `deciles.hazard.sas` macro.) ```{r} #| label: deciles deciles <- hzr_deciles(fit, time = 120, groups = 10) print(deciles) ``` The chi-square test statistic and p-value summarise whether the observed decile-by-decile event counts are consistent with what the parametric model predicts. A good visual check is to plot the two columns side by side: ```{r} #| label: fig-deciles #| fig-cap: "Observed vs. expected mortality by decile of risk" #| fig-width: 7 #| fig-height: 4.5 decile_df <- as.data.frame(deciles) cal_long <- rbind( data.frame(group = decile_df$group, rate = decile_df$observed_rate, Series = "Observed"), data.frame(group = decile_df$group, rate = decile_df$expected_rate, Series = "Expected") ) ggplot(cal_long, aes(group, rate, fill = Series)) + geom_col(position = position_dodge(width = 0.7), alpha = 0.8) + scale_fill_manual(values = c("Observed" = "#56B4E9", "Expected" = "#E69F00")) + scale_x_continuous(breaks = seq_len(nrow(decile_df))) + scale_y_continuous(labels = scales::percent_format(accuracy = 1)) + labs(x = "Risk decile (1 = lowest)", y = "Mortality rate", fill = NULL) + theme_minimal() + theme(legend.position = "bottom") ``` ## Sensitivity analysis Coefficient estimates tell you the *direction and magnitude* of a covariate's effect on the hazard scale, but they don't directly answer the clinical question of *what survival difference this implies* for a real patient. Sensitivity analysis closes that gap. You define two or more covariate profiles — typically a reference profile and one or more high-risk profiles — score each through `predict()`, and plot the resulting survival curves on shared axes. The horizontal and vertical gaps between curves translate the model's coefficients into the clinically meaningful currency of survival probability differences and median-survival shifts. It's also a stress test: if your high-risk profile produces only a small survival gap, either the risk factors aren't doing much in this cohort, or the model is missing the mechanism that makes them matter. ```{r} #| label: sensitivity # Define reference and high-risk profiles ref_profile <- data.frame( age = median(avc$age), status = 2, mal = 0, com_iv = 0 ) high_risk <- data.frame( age = quantile(avc$age, 0.90), status = 4, mal = 1, com_iv = 1 ) ``` ```{r} #| label: fig-sensitivity #| fig-cap: "Sensitivity analysis: reference vs. high-risk profile" #| fig-width: 7 #| fig-height: 4.5 sens_curves <- do.call(rbind, lapply( list("Reference" = ref_profile, "High risk" = high_risk), function(p) { nd <- p[rep(1, length(t_grid)), ] nd$time <- t_grid data.frame(time = t_grid, survival = predict(fit, newdata = nd, type = "survival") * 100, Profile = deparse(substitute(p))) } )) # Fix profile labels sens_curves$Profile <- rep(c("Reference", "High risk"), each = length(t_grid)) sens_curves$Profile <- factor(sens_curves$Profile, levels = c("Reference", "High risk")) ggplot(sens_curves, aes(time, survival, colour = Profile)) + geom_line(linewidth = 0.9) + scale_colour_manual(values = c("Reference" = "#009E73", "High risk" = "#D55E00")) + scale_y_continuous(limits = c(0, 100)) + labs(x = "Months after AVC repair", y = "Freedom from death (%)", colour = NULL) + theme_minimal() + theme(legend.position = "bottom") ``` The gap between the curves quantifies the clinical impact of the risk factors. The sensitivity analysis is most informative when combined with bootstrap confidence intervals on each curve, showing whether the difference is statistically meaningful. ## Analysis workflow summary The pieces in this vignette aren't independent diagnostics — they chain together into a single analytical workflow that goes from raw covariates to a defensible, uncertainty-quantified, well-calibrated fitted model. The complete sequence, with the SAS HAZARD correspondences each piece descends from: 1. **Exploratory screening** (`glm`, `hzr_calibrate()`) --- identify covariate transformations and functional forms 2. **Nonparametric baselines** (`hzr_kaplan()`, `hzr_nelson()`) --- reference KM / Nelson cumulative hazard estimators 3. **Fit hazard model** (`hazard()`) --- parametric shape + covariates 4. **Predict & visualize** (`predict()`) --- survival, hazard, risk profiles 5. **Goodness-of-fit overlay** (`hzr_gof()`) --- parametric vs. KM + Conservation-of-Events check 6. **Bootstrap CIs** (`hzr_bootstrap()`) --- uncertainty quantification via resampling 7. **Sensitivity analysis** (`predict()` on covariate profiles) --- compare scenarios across risk factors 8. **Decile calibration** (`hzr_deciles()`) --- chi-square observed vs. expected by risk decile See `vignette("getting-started")` for the minimal workflow and `vignette("fitting-hazard-models")` for the model-fitting details.