| Title: | Exploratory Subgroup Identification in Clinical Trials with Survival Endpoints |
| Version: | 0.1.0 |
| Description: | Implements statistical methods for exploratory subgroup identification in clinical trials with survival endpoints. Provides tools for identifying patient subgroups with differential treatment effects using machine learning approaches including Generalized Random Forests (GRF), LASSO regularization, and exhaustive combinatorial search algorithms. Features bootstrap bias correction using infinitesimal jackknife methods to address selection bias in post-hoc analyses. Designed for clinical researchers conducting exploratory subgroup analyses in randomized controlled trials, particularly for multi-regional clinical trials (MRCT) requiring regional consistency evaluation. Supports both accelerated failure time (AFT) and Cox proportional hazards models with comprehensive diagnostic and visualization tools. Methods are described in León et al. (2024) <doi:10.1002/sim.10163>. |
| License: | MIT + file LICENSE |
| Depends: | R (≥ 4.1.0) |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| VignetteBuilder: | knitr |
| Imports: | data.table, doFuture, dplyr, foreach, future, future.apply, future.callr, ggplot2, glmnet, grf, gt, patchwork, policytree, progressr, randomForest, rlang, stringr, survival, weightedsurv |
| Suggests: | DiagrammeR, doRNG, htmltools, tidyr, forestploter, cubature, svglite, knitr, rmarkdown, katex |
| URL: | https://github.com/larry-leon/forestsearch, https://larry-leon.github.io/forestsearch/ |
| BugReports: | https://github.com/larry-leon/forestsearch/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-03-19 04:27:29 UTC; larryleon |
| Author: | Larry Leon [aut, cre] |
| Maintainer: | Larry Leon <larry.leon.05@post.harvard.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-23 17:20:14 UTC |
forestsearch: Exploratory Subgroup Identification in Clinical Trials with Survival Endpoints
Description
Implements statistical methods for exploratory subgroup identification in clinical trials with survival endpoints. Provides tools for identifying patient subgroups with differential treatment effects using machine learning approaches including Generalized Random Forests (GRF), LASSO regularization, and exhaustive combinatorial search algorithms. Features bootstrap bias correction using infinitesimal jackknife methods to address selection bias in post-hoc analyses. Designed for clinical researchers conducting exploratory subgroup analyses in randomized controlled trials, particularly for multi-regional clinical trials (MRCT) requiring regional consistency evaluation. Supports both accelerated failure time (AFT) and Cox proportional hazards models with comprehensive diagnostic and visualization tools. Methods are described in León et al. (2024) doi:10.1002/sim.10163.
Author(s)
Maintainer: Larry Leon larry.leon.05@post.harvard.edu
See Also
Useful links:
Report bugs at https://github.com/larry-leon/forestsearch/issues
Cross-Validation Subgroup Match Summary
Description
Summarizes the match between cross-validation subgroups and analysis subgroups.
Usage
CV_sgs(sg1, sg2, confs, sg_analysis)
Arguments
sg1 |
Character vector. Subgroup 1 labels for each fold. |
sg2 |
Character vector. Subgroup 2 labels for each fold. |
confs |
Character vector. Confounder names. |
sg_analysis |
Character vector. Subgroup analysis labels. |
Value
List with indicators for any match, exact match, one match, and covariate-specific matches.
Convert Factor Code to Label
Description
Converts q-indexed codes to human-readable labels using the confs_labels mapping. Supports both full format ("q1.1", "q3.0") and short format ("q1", "q3"). Handles vector input via recursion.
Usage
FS_labels(Qsg, confs_labels)
Arguments
Qsg |
Character. Factor code in format |
confs_labels |
Character vector. Labels for each factor, indexed by factor number. |
Value
Character. Human-readable label wrapped in braces, e.g.,
"{age <= 50}" or "!{age <= 50}" for complement. Returns the
original code if no match is found.
Subgroup summary table estimates
Description
Returns a summary table of subgroup estimates (HR, RMST, medians, etc.).
Usage
SG_tab_estimates(
df,
SG_flag,
outcome.name = "tte",
event.name = "event",
treat.name = "treat",
strata.name = NULL,
hr_1a = NA,
hr_0a = NA,
potentialOutcome.name = NULL,
sg1_name = NULL,
sg0_name = NULL,
draws = 0,
details = FALSE,
return_medians = TRUE,
est.scale = "hr"
)
Arguments
df |
Data frame. |
SG_flag |
Character. Subgroup flag variable. |
outcome.name |
Character. Name of outcome variable. |
event.name |
Character. Name of event indicator variable. |
treat.name |
Character. Name of treatment variable. |
strata.name |
Character. Name of strata variable (optional). |
hr_1a |
Character. Adjusted HR for subgroup 1 (optional). |
hr_0a |
Character. Adjusted HR for subgroup 0 (optional). |
potentialOutcome.name |
Character. Name of potential outcome variable (optional). |
sg1_name |
Character. Name for subgroup 1. |
sg0_name |
Character. Name for subgroup 0. |
draws |
Integer. Number of draws for resampling (optional). |
details |
Logical. Print details. |
return_medians |
Logical. Use medians or RMST. |
est.scale |
Character. Effect scale ("hr" or "1/hr"). |
Value
Data frame of subgroup summary estimates.
Violin/Boxplot Visualization of HR Estimates
Description
Creates violin plots with embedded boxplots showing the distribution of hazard ratio estimates across simulations for different analysis populations. Supports symmetric trimming to handle extreme values that can distort the display when small subgroups produce very large HR estimates.
Usage
SGplot_estimates(
df,
label_training = "Training",
label_testing = "Testing",
label_itt = "ITT (stratified)",
label_sg = "Testing (subgroup)",
trim_fraction = NULL,
ylim = NULL,
show_summary = NULL,
title = "Distribution of HR Estimates Across Simulations",
subtitle = NULL
)
Arguments
df |
data.frame or data.table. Simulation results from
|
label_training |
Character. Label for training data estimates. Default: "Training" |
label_testing |
Character. Label for testing data estimates. Default: "Testing" |
label_itt |
Character. Label for ITT estimates. Default: "ITT (stratified)" |
label_sg |
Character. Label for subgroup estimates. Default: "Testing (subgroup)" |
trim_fraction |
Numeric or NULL. Fraction of observations to trim from each tail (e.g., 0.01 trims the lowest 1\ When non-NULL, trimmed means and SDs are computed for each group, extreme observations are flagged, and the y-axis is clipped to the trimmed data range. Set to NULL (default) for no trimming (backward compatible). |
ylim |
Numeric vector of length 2 or NULL. Explicit y-axis limits
as |
show_summary |
Logical. Annotate each violin with mean (SD) below
the x-axis labels. When trimming is active, displays trimmed
statistics. Default: TRUE when |
title |
Character. Plot title. Default: "Distribution of HR Estimates Across Simulations". |
subtitle |
Character or NULL. Plot subtitle. When trimming is active and subtitle is NULL, an auto-generated note indicating the trim fraction and number of flagged observations is shown. Default: NULL. |
Value
List with components:
- dfPlot_estimates
data.table formatted for plotting, with a
trimmedlogical column when trimming is active- plot_estimates
ggplot2 object
- trim_info
List of per-group trimming diagnostics (NULL when no trimming). Each element contains:
n_total,n_trimmed,n_flagged,raw_mean,raw_sd,trimmed_mean,trimmed_sd,lower_bound,upper_bound.
See Also
mrct_region_sims for generating simulation results,
summaryout_mrct for tabular summaries with trimming
Disjunctive (dummy) coding for factor columns
Description
Disjunctive (dummy) coding for factor columns
Usage
acm.disjctif(df)
Arguments
df |
Data frame with factor variables. |
Value
Data frame with dummy-coded columns.
Add ID Column to Data Frame
Description
Ensures that a data frame has a unique ID column. If id.name is not
provided, a column named "id" is added. If id.name is provided
but does not exist in the data frame, it is created with unique integer
values.
Usage
add_id_column(df.analysis, id.name = NULL)
Arguments
df.analysis |
Data frame to which the ID column will be added. |
id.name |
Character. Name of the ID column to add (default is
|
Value
Data frame with the ID column added if necessary.
Add Unprocessed Variables from Original Data
Description
Add Unprocessed Variables from Original Data
Usage
add_unprocessed_vars(
df_work,
data,
outcome_var,
event_var,
treatment_var,
continuous_vars,
factor_vars,
verbose
)
Analyze subgroup for summary table (OPTIMIZED)
Description
Analyzes a subgroup and returns formatted results for summary table. Uses optimized cox_summary() and reduces redundant calculations.
Usage
analyze_subgroup(
df_sub,
outcome.name,
event.name,
treat.name,
strata.name,
subgroup_name,
hr_a,
potentialOutcome.name,
return_medians,
N
)
Arguments
df_sub |
Data frame for subgroup. |
outcome.name |
Character. Name of outcome variable. |
event.name |
Character. Name of event indicator variable. |
treat.name |
Character. Name of treatment variable. |
strata.name |
Character. Name of strata variable (optional). |
subgroup_name |
Character. Subgroup name. |
hr_a |
Character. Adjusted hazard ratio (optional). |
potentialOutcome.name |
Character. Name of potential outcome variable (optional). |
return_medians |
Logical. Use medians or RMST. |
N |
Integer. Total sample size. |
Value
Character vector of results.
Apply Spline Constraint to Treatment Effect Coefficients
Description
Apply Spline Constraint to Treatment Effect Coefficients
Usage
apply_spline_constraint(b0, spline_var, knot, zeta, log_hrs, k_treat, verbose)
Assemble Final Results Object
Description
Assemble Final Results Object
Usage
assemble_results(
df_super,
mu,
tau,
gamma,
b0,
cens_model,
subgroup_vars,
subgroup_cuts,
subgroup_definitions,
hr_results,
continuous_vars,
factor_vars,
model,
n_super,
seed,
spline_info = NULL
)
Assign data to subgroups based on selected node
Description
Creates treatment recommendation flags based on identified subgroup
Usage
assign_subgroup_membership(data, best_subgroup, trees, X)
Arguments
data |
Data frame. Original data |
best_subgroup |
Data frame row. Selected subgroup information |
trees |
List. Policy trees |
X |
Matrix. Covariate matrix |
Value
Data frame with added predict.node and treat.recommend columns
Bootstrap Results for ForestSearch with Bias Correction
Description
Runs bootstrap analysis for ForestSearch, fitting Cox models and computing bias-corrected estimates and valid CIs (see vignette for references)
Usage
bootstrap_results(
fs.est,
df_boot_analysis,
cox.formula.boot,
nb_boots,
show_three,
H_obs,
Hc_obs,
seed = 8316951L
)
Arguments
fs.est |
List. ForestSearch results object from
|
df_boot_analysis |
Data frame. Bootstrap analysis data with same structure
as |
cox.formula.boot |
Formula. Cox model formula for bootstrap, typically
created by |
nb_boots |
Integer. Number of bootstrap samples to generate (e.g., 500-1000). More iterations provide better bias correction but increase computation time. |
show_three |
Logical. If |
H_obs |
Numeric. Observed log hazard ratio for subgroup H (harm/questionable group,
|
Hc_obs |
Numeric. Observed log hazard ratio for subgroup H^c (complement/recommend,
|
seed |
Integer. Random seed for reproducibility. Default 8316951L.
Must match the seed used in |
Value
Data.table with one row per bootstrap iteration and columns:
- boot_id
Integer. Bootstrap iteration number (1 to
nb_boots)- H_biasadj_1
Bias-corrected estimate for H using method 1:
H_obs - (Hstar_star - Hstar_obs)- H_biasadj_2
Bias-corrected estimate for H using method 2:
2*H_obs - (H_star + Hstar_star - Hstar_obs)- Hc_biasadj_1
Bias-corrected estimate for H^c using method 1
- Hc_biasadj_2
Bias-corrected estimate for H^c using method 2
- max_sg_est
Numeric. Maximum subgroup hazard ratio found
- L
Integer. Number of candidate factors evaluated
- max_count
Integer. Maximum number of factor combinations
- events_H_0
Integer. Number of events in control arm of original subgroup H on bootstrap sample
- events_H_1
Integer. Number of events in treatment arm of original subgroup H on bootstrap sample
- events_Hc_0
Integer. Number of events in control arm of original subgroup H^c on bootstrap sample
- events_Hc_1
Integer. Number of events in treatment arm of original subgroup H^c on bootstrap sample
- events_Hstar_0
Integer. Number of events in control arm of new subgroup H* on original data
- events_Hstar_1
Integer. Number of events in treatment arm of new subgroup H* on original data
- events_Hcstar_0
Integer. Number of events in control arm of new subgroup H^c* on original data
- events_Hcstar_1
Integer. Number of events in treatment arm of new subgroup H^c* on original data
- tmins_search
Numeric. Minutes spent on subgroup search in this iteration
- tmins_iteration
Numeric. Total minutes for this bootstrap iteration
- Pcons
Numeric. Consistency p-value for top subgroup
- hr_sg
Numeric. Hazard ratio for top subgroup
- N_sg
Integer. Sample size of top subgroup
- E_sg
Integer. Number of events in top subgroup
- K_sg
Integer. Number of factors defining top subgroup
- g_sg
Numeric. Subgroup group ID
- m_sg
Numeric. Subgroup index
- M.1
Character. First factor label
- M.2
Character. Second factor label
- M.3
Character. Third factor label
- M.4
Character. Fourth factor label
- M.5
Character. Fifth factor label
- M.6
Character. Sixth factor label
- M.7
Character. Seventh factor label
Rows where no valid subgroup was found will have NA for bias corrections.
The returned object has a "timing" attribute with summary statistics.
Bias Correction Methods
Two bias correction approaches are implemented:
-
Method 1 (Simple Optimism):
H_{adj1} = H_{obs} - (H^*_{*} - H^*_{obs})where
H^*_{*}is the new subgroup HR on bootstrap data andH^*_{obs}is the new subgroup HR on original data. -
Method 2 (Double Bootstrap):
H_{adj2} = 2 \times H_{obs} - (H_{*} + H^*_{*} - H^*_{obs})where
H_{*}is the original subgroup HR on bootstrap data.
where:
-
H_obs: Original subgroup HR on original data -
H_star: Original subgroup HR on bootstrap data -
Hstar_obs: New subgroup (found in bootstrap) HR on original data -
Hstar_star: New subgroup (found in bootstrap) HR on bootstrap data
Computational Details
Uses
doFuturebackend for parallel execution (configured externally)Sets reproducible seeds:
8316951 + boot * 100for each iterationEach bootstrap iteration runs full ForestSearch pipeline including variable selection, subgroup search, and consistency evaluation
Sequential execution within each bootstrap prevents nested parallelization
Failed bootstrap iterations generate warnings but don't stop execution
Confounders are removed from bootstrap data to force fresh variable selection
Bootstrap Configuration
Each bootstrap iteration modifies ForestSearch arguments to:
-
Suppress output:
details,showten_subgroups,plot.sg,plot.grfall set toFALSE -
Force re-selection:
grf_resandgrf_cutsset toNULL -
Prevent nested parallel:
parallel_args$plan = "sequential",workers = 1
Performance Considerations
Typical runtime: 1-5 seconds per bootstrap iteration
For 1000 bootstraps with 6 workers: ~3-10 minutes total
Memory usage scales with dataset size and number of workers
Consider reducing
nb_bootsfor initial testing (e.g., 100)
Error Handling
The function gracefully handles three failure modes:
Bootstrap sample creation fails: Returns row with all
NAForestSearch fails to run: Warns and returns row with all
NAForestSearch runs but finds no subgroup: Returns row with all
NA
All three cases ensure the foreach loop can still combine results via rbind.
Note
This function is designed to be called within a foreach loop
with %dofuture% operator. It requires:
All functions in
get_bootstrap_exportsto be available in the parallel workersPackages listed in
BOOTSTRAP_REQUIRED_PACKAGESto be installedProper parallel backend setup via
setup_parallel_SGcons
See Also
forestsearch_bootstrap_dofuture for the wrapper function that
sets up parallelization and calls this function
build_cox_formula for creating the Cox formula
fit_cox_models for initial Cox model fitting
get_Cox_sg for Cox model fitting on subgroups
get_dfRes for processing bootstrap results into confidence intervals
bootstrap_ystar for generating the Ystar matrix
Bootstrap Ystar Matrix
Description
Generates a bootstrap matrix for Ystar using parallel processing.
Usage
bootstrap_ystar(df, nb_boots, seed = 8316951L)
Arguments
df |
Data frame. |
nb_boots |
Integer. Number of bootstrap samples. |
seed |
Integer. Random seed for reproducibility. Default 8316951L.
Must match the seed used in |
Value
Matrix of bootstrap samples (nb_boots x nrow(df)).
Build Classification Rate Table from Simulation Results
Description
Constructs a publication-quality gt table summarizing subgroup
identification and classification rates across one or more data generation
scenarios and analysis methods. The layout mirrors Table 4 of
Leon et al. (2024) with metrics grouped by model scenario (null / alt)
and columns for each analysis method.
Usage
build_classification_table(
scenario_results,
analyses = NULL,
digits = 2,
title = "Subgroup Identification and Classification Rates",
n_sims = NULL,
bold_threshold = 0.05,
font_size = 12
)
Arguments
scenario_results |
Named list. Each element is itself a list with:
|
analyses |
Character vector of analysis labels to include
(e.g., |
digits |
Integer. Decimal places for proportions. Default: 2. |
title |
Character. Table title. Default:
|
n_sims |
Integer. Number of simulations (for subtitle). Default:
|
bold_threshold |
Numeric. Type I error threshold above which the
|
font_size |
Numeric. Font size in pixels for table text. Default: 12. Increase to 14 or 16 for larger display. |
Details
For each scenario the function computes:
-
any(H): Proportion of simulations identifying any subgroup. -
sens(H): Mean sensitivity (only under alternative). -
sens(Hc): Mean specificity. -
ppv(H): Mean positive predictive value (only under alternative). -
ppv(Hc): Mean negative predictive value. -
avg|H|: Mean size of identified subgroup (when found).
Under the null hypothesis the rows are reduced to any(H),
sens(Hc), ppv(Hc), and avg|H|.
Value
A gt table object.
See Also
format_oc_results,
summarize_simulation_results
Build Cox Model Formula
Description
Constructs a Cox model formula from variable names.
Usage
build_cox_formula(outcome.name, event.name, treat.name)
Arguments
outcome.name |
Character. Name of outcome variable. |
event.name |
Character. Name of event indicator variable. |
treat.name |
Character. Name of treatment variable. |
Value
An R formula object for Cox regression.
Build Estimation Properties Table from Simulation Results
Description
Constructs a publication-quality gt table summarizing estimation
properties for hazard ratios in the identified subgroup and its complement.
The layout mirrors Table 5 of Leon et al. (2024), showing average estimate,
empirical SD, min, max, and relative bias for each estimator.
Usage
build_estimation_table(
results,
dgm,
analysis_method = "FSlg",
n_boots = NULL,
digits = 2,
title = "Estimation Properties",
subtitle = NULL,
font_size = 12,
cde_H = NULL,
cde_Hc = NULL
)
Arguments
results |
|
dgm |
DGM object. Used for true parameter values ( |
analysis_method |
Character. Which analysis method to tabulate
(e.g., |
n_boots |
Integer or |
digits |
Integer. Decimal places. Default: 2. |
title |
Character. Table title. |
subtitle |
Character or |
font_size |
Numeric. Font size in pixels for table text. Default: 12. Increase to 14 or 16 for larger display. |
cde_H |
Numeric or |
cde_Hc |
Numeric or |
Details
Uses the paper's notation conventions:
theta-dagger: Marginal (causal) HR truth
theta-ddagger: Controlled direct effect (CDE) truth
theta-hat(H-hat): Plugin Cox estimate in identified subgroup
theta-hat*(H-hat): Bootstrap bias-corrected estimate
Includes both Cox-based HR and AHR (Average Hazard Ratio from loghr_po) estimators when AHR columns are present in the results.
For each subgroup (H and Hc) the function reports:
-
Avg: Mean of the estimates across estimable simulations.
-
SD: Empirical standard deviation.
-
Min / Max: Range.
-
b-dagger: Relative bias (percent) vs marginal truth,
100 * (Avg - theta_dagger) / theta_dagger. -
b-ddagger (conditional): Relative bias (percent) vs CDE truth, shown when CDE values are available.
When bootstrap-corrected columns (hr.H.bc, hr.Hc.bc) are
present in results, an additional bias-corrected row
(theta-hat*(H-hat)) is added per subgroup.
When AHR columns (ahr.H.hat, ahr.Hc.hat) are present, AHR
estimation rows are appended using the DGM's true AHR values for relative
bias calculation.
When CDE columns (cde.H.hat, cde.Hc.hat) are present and
CDE truth values are available, CDE estimation rows
(theta-ddagger(H-hat)) are appended. The b-dagger column for CDE rows
reports bias relative to the CDE truth rather than the marginal HR.
Value
A gt table object, or NULL if no estimable
realizations exist.
See Also
build_classification_table,
format_oc_results, get_dgm_hr
Calculate Covariance for Bootstrap Estimates
Description
Calculates the covariance between a vector and bootstrap estimates.
Usage
calc_cov(x, Est)
Arguments
x |
Numeric vector. |
Est |
Numeric vector of bootstrap estimates. |
Value
Numeric value of covariance.
Calculate counts for subgroup summary
Description
Calculates sample size, treated count, and event count for a subgroup.
Usage
calculate_counts(Y, E, Treat, N)
Arguments
Y |
Numeric vector of outcome. |
E |
Numeric vector of event indicators. |
Treat |
Numeric vector of treatment indicators. |
N |
Integer. Total sample size. |
Value
List with formatted counts.
Calculate Event Counts by Treatment Arm
Description
Calculate Event Counts by Treatment Arm
Usage
calculate_event_counts(dd, tt, id.x)
Calculate Hazard Ratios from Potential Outcomes
Description
Calculate Hazard Ratios from Potential Outcomes
Usage
calculate_hazard_ratios(df_super, n_super, mu, tau, model, verbose)
Arguments
df_super |
Data frame with super population |
n_super |
Size of super population |
mu |
Intercept parameter |
tau |
Scale parameter |
model |
Model type ("alt" or "null") |
verbose |
Logical for verbose output |
Value
List of hazard ratios
Calculate Linear Predictors for Potential Outcomes
Description
Calculate Linear Predictors for Potential Outcomes
Usage
calculate_linear_predictors(
df_super,
covariate_cols,
gamma,
b0,
spline_info = NULL
)
Calculate Maximum Combinations
Description
Calculate Maximum Combinations
Usage
calculate_max_combinations(L, maxk)
Calculate potential outcome hazard ratio
Description
Calculates the average hazard ratio from a potential outcome variable.
Usage
calculate_potential_hr(df, potentialOutcome.name)
Arguments
df |
Data frame. |
potentialOutcome.name |
Character. Name of potential outcome variable. |
Value
Numeric value of average hazard ratio.
Calculate Skewness
Description
Helper function to calculate sample skewness.
Usage
calculate_skewness(x)
Arguments
x |
Numeric vector |
Value
Numeric skewness value
Calibrate Censoring Adjustment to Match DGM Reference Distribution
Description
Uses root-finding to select a value of cens_adjust for
simulate_from_dgm such that a chosen censoring summary
statistic in the simulated data matches the corresponding statistic from
the DGM reference data (dgm$df_super).
Usage
calibrate_cens_adjust(
dgm,
target = c("rate", "km_median"),
n = 1000,
rand_ratio = 1,
analysis_time = 48,
max_entry = 24,
seed = 42,
interval = c(-3, 3),
tol = 1e-04,
n_eval = 2000,
verbose = TRUE,
...
)
Arguments
dgm |
An |
target |
Character. Calibration target: |
n |
Integer. Sample size passed to |
rand_ratio |
Numeric. Randomisation ratio passed to
|
analysis_time |
Numeric. Calendar analysis time passed to
|
max_entry |
Numeric. Maximum staggered entry time passed to
|
seed |
Integer. Base random seed. Each evaluation of the objective
function uses this seed for reproducibility. Default |
interval |
Numeric vector of length 2. Search interval for
|
tol |
Numeric. Root-finding tolerance. Default |
n_eval |
Integer. Sample size used inside the objective function
during root-finding. Smaller values are faster but noisier; increase
for precision. Default |
verbose |
Logical. Print search progress and final result.
Default |
... |
Additional arguments passed to |
Details
Two calibration targets are supported:
"rate"Overall censoring rate (proportion censored). Finds
cens_adjustsuch thatmean(event_sim == 0)in simulated data equalsmean(event == 0)indgm$df_super."km_median"KM-based median censoring time, estimated by reversing the event indicator so censored observations become the "event" of interest. Finds
cens_adjustsuch that the simulated KM median matches the reference KM median.
How the objective function works
At each candidate cens_adjust value, the objective function:
Calls
simulate_from_dgm()withn = n_evaland the candidatecens_adjust.Calls
check_censoring_dgm()withverbose = FALSEto extract the target metric.Returns
sim_metric - ref_metric.
uniroot finds the zero crossing, i.e. the cens_adjust at
which simulated and reference metrics are equal.
Monotonicity
The objective is monotone in cens_adjust for both targets:
Larger
cens_adjust→ longer censoring times → lower censoring rate and higher KM median.Smaller
cens_adjust→ shorter censoring times → higher censoring rate and lower KM median.
If uniroot fails (the target lies outside the search interval),
the boundary values are printed and a wider interval should be
tried.
Stochastic noise
Because the objective function involves simulation, there is Monte Carlo
noise. Setting a fixed seed and a sufficiently large n_eval
(>= 2000) reduces noise enough for reliable root-finding. The
tol argument controls the root-finding tolerance on the
cens_adjust scale (not the metric scale).
Value
A named list with elements:
cens_adjustCalibrated
cens_adjustvalue.targetCalibration target used.
ref_valueReference metric value from
dgm$df_super.sim_valueAchieved metric value in simulated data at the calibrated
cens_adjust.residualAbsolute difference between
sim_valueandref_value.iterationsNumber of
unirootiterations.diagnosticOutput of
check_censoring_dgmat the calibrated value (invisibly).
See Also
simulate_from_dgm, check_censoring_dgm,
generate_aft_dgm_flex
Examples
library(survival)
# Build DGM on months scale
gbsg$time_months <- gbsg$rfstime / 30.4375
dgm <- generate_aft_dgm_flex(
data = gbsg,
continuous_vars = c("age", "size", "nodes", "pgr", "er"),
factor_vars = c("meno", "grade"),
outcome_var = "time_months",
event_var = "status",
treatment_var = "hormon",
subgroup_vars = c("er", "meno"),
subgroup_cuts = list(er = 20, meno = 0)
)
# Calibrate so simulated censoring rate matches reference
cal_rate <- calibrate_cens_adjust(
dgm = dgm,
target = "rate",
n = 1000,
analysis_time = 84,
max_entry = 24
)
cat("Calibrated cens_adjust (rate):", cal_rate$cens_adjust, "\n")
# Calibrate to KM median censoring time instead
cal_km <- calibrate_cens_adjust(
dgm = dgm,
target = "km_median",
n = 1000,
analysis_time = 84,
max_entry = 24
)
cat("Calibrated cens_adjust (km_median):", cal_km$cens_adjust, "\n")
# Use calibrated value in simulation
sim <- simulate_from_dgm(
dgm = dgm,
n = 1000,
analysis_time = 84,
max_entry = 24,
cens_adjust = cal_rate$cens_adjust,
seed = 123
)
mean(sim$event_sim) # event rate
mean(sim$event_sim == 0) # censoring rate — should match ref
Calibrate k_inter for Target Subgroup Hazard Ratio
Description
Finds the interaction effect multiplier (k_inter) that achieves a target hazard ratio in the harm subgroup.
Usage
calibrate_k_inter(
target_hr_harm,
model = "alt",
k_treat = 1,
cens_type = "weibull",
k_inter_range = c(-100, 100),
tol = 1e-06,
use_ahr = FALSE,
verbose = FALSE,
...
)
Arguments
target_hr_harm |
Numeric. Target hazard ratio for the harm subgroup |
model |
Character. Model type ("alt" only). Default: "alt" |
k_treat |
Numeric. Treatment effect multiplier. Default: 1 |
cens_type |
Character. Censoring type. Default: "weibull" |
k_inter_range |
Numeric vector of length 2. Search range for k_inter. Default: c(-100, 100) |
tol |
Numeric. Tolerance for root finding. Default: 1e-6 |
use_ahr |
Logical. If TRUE, calibrate to AHR instead of Cox-based HR. Default: FALSE |
verbose |
Logical. Print diagnostic information. Default: FALSE |
... |
Additional arguments passed to |
Details
This function uses uniroot to find the k_inter value such that
the empirical HR (or AHR) in the harm subgroup equals target_hr_harm.
Value
Numeric value of k_inter that achieves the target HR
Examples
# Find k_inter for HR = 1.5 in harm subgroup
k <- calibrate_k_inter(target_hr_harm = 1.5, verbose = TRUE)
# Verify
dgm <- setup_gbsg_dgm(model = "alt", k_inter = k, verbose = FALSE)
print(dgm)
# Calibrate to AHR instead
k_ahr <- calibrate_k_inter(target_hr_harm = 1.5, use_ahr = TRUE, verbose = TRUE)
dgm_ahr <- setup_gbsg_dgm(model = "alt", k_inter = k_ahr, verbose = FALSE)
print(dgm_ahr)
Diagnose Censoring Consistency Between DGM Source Data and Simulated Data
Description
Compares the censoring distribution observed in the data used to build the
DGM against the censoring generated by simulate_from_dgm.
Reports censoring rates, time quantiles, KM-based median censoring times,
and flags substantial discrepancies.
Usage
check_censoring_dgm(
sim_data,
dgm,
treat_var = "treat_sim",
rate_tol = 0.1,
median_tol = 0.25,
verbose = TRUE
)
Arguments
sim_data |
A |
dgm |
An |
treat_var |
Character. Name of the treatment column in
|
rate_tol |
Numeric. Absolute tolerance (proportion scale) for
flagging a censoring-rate discrepancy. Default |
median_tol |
Numeric. Relative tolerance for flagging a KM median
censoring-time discrepancy. Default |
verbose |
Logical. If |
Details
The reference censoring distribution is derived from dgm$df_super,
sampled with replacement from the data passed to
generate_aft_dgm_flex(). Columns y (observed time) and
event (event indicator) in df_super reflect the original
observed censoring process on the DGM time scale.
The KM median censoring time is estimated by reversing the event indicator
(1 - event), treating events as censored and censored observations
as the event of interest. This gives a non-parametric estimate of the
censoring time distribution unconfounded by event occurrence.
Common causes of discrepancy: (1) time-scale mismatch (DGM built on days,
analysis_time in months); check exp(dgm$model_params$mu)
against your analysis_time. (2) Large cens_adjust shifting
censoring substantially from the fitted model. (3) Short
analysis_time or time_eos making administrative censoring
dominate the censoring process.
Value
Invisibly returns a named list. Elements are: rates (data
frame of censoring rates overall and by arm); quantiles (data
frame of censoring-time quantiles among censored subjects);
km_medians (data frame of KM-based median censoring times); and
flags (character vector of triggered warnings, empty if none).
See Also
simulate_from_dgm, generate_aft_dgm_flex
Examples
dgm <- setup_gbsg_dgm(model = "null", verbose = FALSE)
sim_data <- simulate_from_dgm(dgm, n = 200)
check_censoring_dgm(sim_data, dgm = dgm)
Confidence Interval for Estimate
Description
Calculates confidence interval for an estimate, optionally on log(HR) scale.
Usage
ci_est(x, sd, alpha = 0.025, scale = "hr", est.loghr = TRUE)
Arguments
x |
Numeric estimate. |
sd |
Numeric standard deviation. |
alpha |
Numeric significance level (default: 0.025). |
scale |
Character. "hr" or "1/hr". |
est.loghr |
Logical. Is estimate on log(HR) scale? |
Value
List with length, lower, upper, sd, and estimate.
Compare Detection Curves Across Sample Sizes
Description
Generates and compares detection probability curves for multiple subgroup sample sizes.
Usage
compare_detection_curves(
n_sg_values,
prop_cens = 0.3,
hr_threshold = 1.25,
hr_consistency = 1,
theta_range = c(0.5, 3),
n_points = 40L,
verbose = TRUE
)
Arguments
n_sg_values |
Integer vector. Subgroup sample sizes to compare. |
prop_cens |
Numeric. Proportion censored. Default: 0.3 |
hr_threshold |
Numeric. HR threshold. Default: 1.25 |
hr_consistency |
Numeric. HR consistency threshold. Default: 1.0 |
theta_range |
Numeric vector of length 2. Range of HR values. Default: c(0.5, 3.0) |
n_points |
Integer. Number of points per curve. Default: 40 |
verbose |
Logical. Print progress. Default: TRUE |
Value
A data.frame with all curves combined, including n_sg as a factor.
Compare Multiple Survival Regression Models
Description
Performs comprehensive comparison of multiple survreg models including convergence checking, information criteria comparison, and model selection.
Usage
compare_multiple_survreg(
...,
model_names = NULL,
verbose = TRUE,
criteria = c("AIC", "BIC")
)
Arguments
... |
survreg model objects to compare |
model_names |
Optional character vector of model names |
verbose |
Logical, whether to print detailed output (default: TRUE) |
criteria |
Character vector of criteria to use ("AIC", "BIC", or both) |
Value
A list of class "multi_survreg_comparison" containing:
- models
Named list of input models
- convergence
Convergence status for each model
- comparison
Model comparison statistics
- rankings
Model rankings by different criteria
- best_model
Name of the best model
- recommendation
Text recommendation
Compute AHR from loghr_po
Description
Computes Average Hazard Ratio from individual log hazard ratios.
Usage
compute_ahr(df, subset_indicator = NULL)
Arguments
df |
Data frame with loghr_po column |
subset_indicator |
Optional logical/integer vector for subsetting |
Value
Numeric AHR value
Compute CDE from theta_0 and theta_1
Description
Computes Controlled Direct Effect as the ratio of average hazard
contributions on the natural scale:
CDE(S) = mean(exp(theta_1[S])) / mean(exp(theta_0[S])).
Usage
compute_cde(df, subset_indicator = NULL)
Arguments
df |
Data frame with |
subset_indicator |
Optional logical/integer vector for subsetting.
If provided, only rows where |
Value
Numeric CDE value, or NA_real_ if columns are missing.
Compute Probability of Detecting True Subgroup
Description
Calculates the probability that a true subgroup with given hazard ratio will be detected using the ForestSearch consistency-based criteria.
Usage
compute_detection_probability(
theta,
n_sg,
prop_cens = 0.3,
hr_threshold = 1.25,
hr_consistency = 1,
method = c("cubature", "monte_carlo"),
n_mc = 100000L,
tol = 1e-04,
verbose = FALSE
)
Arguments
theta |
Numeric. True hazard ratio in the subgroup. Can be a vector for computing detection probability across multiple HR values. |
n_sg |
Integer. Subgroup sample size. |
prop_cens |
Numeric. Proportion censored (0-1). Default: 0.3 |
hr_threshold |
Numeric. HR threshold for detection (e.g., 1.25). This is the threshold that the average HR across splits must exceed. |
hr_consistency |
Numeric. HR consistency threshold (e.g., 1.0). This is the threshold each individual split must exceed. Default: 1.0 |
method |
Character. Integration method: "cubature" (recommended for accuracy) or "monte_carlo" (faster for exploration). Default: "cubature" |
n_mc |
Integer. Number of Monte Carlo samples if method = "monte_carlo". Default: 100000 |
tol |
Numeric. Relative tolerance for cubature integration. Default: 1e-4 |
verbose |
Logical. Print progress for vector inputs. Default: FALSE |
Details
This function computes P(detect | theta) using the asymptotic normal approximation for the log hazard ratio estimator. The detection criterion is based on ForestSearch's split-sample consistency evaluation:
The subgroup HR estimate must exceed hr_threshold on average
Each split-half must individually exceed hr_consistency
The approximation assumes:
Large sample sizes (CLT applies)
Var(log(HR)) ~ 4/d per treatment arm
Independence between split-halves (conditional on true effect)
Value
If theta is scalar, returns a single probability. If theta is a vector, returns a data.frame with columns: theta, probability.
Examples
# Single HR value
prob <- compute_detection_probability(
theta = 1.5,
n_sg = 60,
prop_cens = 0.2,
hr_threshold = 1.25
)
# Vector of HR values for power curve
hr_values <- seq(1.0, 2.5, by = 0.1)
results <- compute_detection_probability(
theta = hr_values,
n_sg = 60,
prop_cens = 0.2,
hr_threshold = 1.25,
verbose = TRUE
)
# Plot detection probability curve
plot(results$theta, results$probability, type = "l",
xlab = "True HR", ylab = "P(detect)")
Compute Detection Probability for Single Theta (Internal)
Description
Compute Detection Probability for Single Theta (Internal)
Usage
compute_detection_probability_single(
theta,
n_sg,
prop_cens,
k_avg,
k_ind,
method,
n_mc,
tol
)
Arguments
theta |
Numeric. True hazard ratio in the subgroup. Can be a vector for computing detection probability across multiple HR values. |
n_sg |
Integer. Subgroup sample size. |
prop_cens |
Numeric. Proportion censored (0-1). Default: 0.3 |
k_avg |
Log of hr_threshold |
k_ind |
Log of hr_consistency |
method |
Character. Integration method: "cubature" (recommended for accuracy) or "monte_carlo" (faster for exploration). Default: "cubature" |
n_mc |
Integer. Number of Monte Carlo samples if method = "monte_carlo". Default: 100000 |
tol |
Numeric. Relative tolerance for cubature integration. Default: 1e-4 |
Value
Numeric probability
Compute and Attach CDE Values to a DGM Object
Description
Calculates Controlled Direct Effect (CDE) hazard ratios from the
super-population potential outcomes (theta_0, theta_1)
and attaches them to the DGM's hazard_ratios list. This enables
automatic CDE detection by build_estimation_table.
Usage
compute_dgm_cde(dgm, harm_col = NULL)
Arguments
dgm |
A DGM object (e.g., from |
harm_col |
Character. Name of the subgroup indicator column in
|
Details
The CDE for subgroup S is defined as:
CDE(S) = mean(exp(theta_1[S])) / mean(exp(theta_0[S]))
which is the ratio of average hazard contributions on the natural scale.
This differs from the AHR (exp(mean(loghr_po))) due to Jensen's
inequality. In the notation of Leon et al. (2024), CDE corresponds to
theta-ddagger.
The function detects the subgroup indicator column automatically,
checking for flag.harm, flag_harm, and H in
the super-population data frame.
Value
The DGM object with CDE values added to
dgm$hazard_ratios (CDE, CDE_harm,
CDE_no_harm) and to top-level fields (dgm$CDE,
dgm$cde_H, dgm$cde_Hc).
See Also
build_estimation_table, get_dgm_hr
Examples
dgm <- setup_gbsg_dgm(model = "alt", k_inter = 2.0, verbose = FALSE)
dgm <- compute_dgm_cde(dgm)
dgm$hazard_ratios$CDE_harm # theta-ddagger(H)
dgm$hazard_ratios$CDE # theta-ddagger overall
Compute node metrics for a policy tree
Description
Aggregates scores by leaf node and calculates treatment effect differences
Usage
compute_node_metrics(data, dr.scores, tree, X, n.min)
Arguments
data |
Data frame. Original data |
dr.scores |
Matrix. Doubly robust scores |
tree |
Policy tree object |
X |
Matrix. Covariate matrix |
n.min |
Integer. Minimum subgroup size |
Value
Data frame with node metrics
Compute Hazard Ratio for a Single Subgroup
Description
Internal helper function to compute HR and CI for a subgroup. Uses robust (sandwich) standard errors for consistency with cox_summary().
Usage
compute_sg_hr(
df,
sg_name,
outcome.name,
event.name,
treat.name,
E.name,
C.name,
z_alpha = qnorm(0.975),
conf.level = 0.95
)
Arguments
df |
Data frame for the subgroup. |
sg_name |
Character. Name of the subgroup. |
outcome.name |
Character. Name of survival time variable. |
event.name |
Character. Name of event indicator variable. |
treat.name |
Character. Name of treatment variable. |
E.name |
Character. Label for experimental arm. |
C.name |
Character. Label for control arm. |
z_alpha |
Numeric. Z-multiplier for CI (default: qnorm(0.975) for 95% CI). |
conf.level |
Numeric. Confidence level for intervals (default: 0.95). |
Value
Data frame with single row of HR estimates, or NULL if model fails.
Compute Hazard Ratio Estimates for Subgroups
Description
Internal function to compute Cox model hazard ratio estimates with confidence intervals for ITT, H, and Hc subgroups.
Usage
compute_sg_hr_estimates(
df,
df_H,
df_Hc,
outcome.name,
event.name,
treat.name,
conf.level = 0.95,
verbose = FALSE
)
Arguments
df |
Full analysis data frame |
df_H |
Data frame for H subgroup |
df_Hc |
Data frame for Hc subgroup |
outcome.name |
Character. Outcome variable name |
event.name |
Character. Event indicator name |
treat.name |
Character. Treatment variable name |
conf.level |
Numeric. Confidence level |
verbose |
Logical. Print messages |
Value
Data frame with HR estimates
Compute Summary Statistics for Subgroups
Description
Internal function to compute summary statistics for each subgroup.
Usage
compute_sg_summary(
df,
df_H,
df_Hc,
outcome.name,
event.name,
treat.name,
sg0_name,
sg1_name
)
Arguments
df |
Full analysis data frame |
df_H |
Data frame for H subgroup |
df_Hc |
Data frame for Hc subgroup |
outcome.name |
Character. Outcome variable name |
event.name |
Character. Event indicator name |
treat.name |
Character. Treatment variable name |
sg0_name |
Character. Label for H subgroup |
sg1_name |
Character. Label for Hc subgroup |
Value
Data frame with summary statistics
Count ID Occurrences in Bootstrap Sample
Description
Counts the number of times an ID appears in a bootstrap sample.
Usage
count_boot_id(x, dfb)
Arguments
x |
ID value. |
dfb |
Data frame of bootstrap sample. |
Value
Integer count of occurrences.
Comprehensive Wrapper for Cox Spline Analysis with AHR and CDE Plotting
Description
This wrapper function combines Cox spline fitting with comprehensive visualization of Average Hazard Ratios (AHRs) and Controlled Direct Effects (CDEs) as described in the MRCT subgroups analysis documentation.
Usage
cox_ahr_cde_analysis(
df,
tte_name = "os_time",
event_name = "os_event",
treat_name = "treat",
z_name = "biomarker",
loghr_po_name = "loghr_po",
theta1_name = "theta_1",
theta0_name = "theta_0",
spline_df = 3,
alpha = 0.2,
hr_threshold = 0.7,
plot_style = c("combined", "separate", "grid"),
plot_select = c("all", "profile_ahr", "ahr_only"),
save_plots = FALSE,
output_dir = tempdir(),
verbose = TRUE
)
Arguments
df |
Data frame containing survival data with potential outcomes. |
tte_name |
Character string specifying time-to-event variable name.
Default: |
event_name |
Character string specifying event indicator variable
name. Default: |
treat_name |
Character string specifying treatment variable name.
Default: |
z_name |
Character string specifying continuous covariate/biomarker
name. Default: |
loghr_po_name |
Character string specifying potential outcome log HR
variable. Default: |
theta1_name |
Optional: variable name for theta_1 (treated potential
outcome). Default: |
theta0_name |
Optional: variable name for theta_0 (control potential
outcome). Default: |
spline_df |
Integer degrees of freedom for spline fitting. Default: 3. |
alpha |
Numeric significance level for confidence intervals. Default: 0.20. |
hr_threshold |
Numeric hazard ratio threshold for subgroup
identification, or |
plot_style |
Character: |
plot_select |
Character controlling which panels to display:
|
save_plots |
Logical whether to save plots to file. Default: FALSE. |
output_dir |
Character directory for saving plots. Default:
|
verbose |
Logical for diagnostic output. Default: TRUE. |
Value
List of class "cox_ahr_cde" containing:
- cox_fit
Results from
cox_cs_fitfunction.- ahr_results
AHR calculations for different subgroup definitions.
- cde_results
CDE calculations if theta variables available.
- optimal_cutpoint
Optimal biomarker cutpoint, or
NULLwhenhr_thresholdisNULL.- subgroup_stats
Statistics for recommended and questionable subgroups, or overall-only when
hr_thresholdisNULL.- data
List with z_values, loghr_po, and subgroup assignments.
Examples
# Build a small synthetic dataset with required columns
set.seed(42)
n <- 200
df_ex <- data.frame(
os_time = rexp(n, rate = 0.01),
os_event = rbinom(n, 1, 0.6),
treat = rep(0:1, each = n / 2),
biomarker = rnorm(n),
loghr_po = rnorm(n, mean = -0.3, sd = 0.5)
)
# With threshold - full subgroup analysis
results <- cox_ahr_cde_analysis(
df = df_ex, z_name = "biomarker",
hr_threshold = 1.25, plot_style = "grid",
verbose = FALSE
)
# Without threshold - pure AHR curves
results <- cox_ahr_cde_analysis(
df = df_ex, z_name = "biomarker",
hr_threshold = NULL, plot_select = "ahr_only",
verbose = FALSE
)
Fit Cox Model with Cubic Spline for Treatment Effect Heterogeneity
Description
Estimates treatment effects as a function of a continuous covariate using a Cox proportional hazards model with natural cubic splines. The function models treatment-by-covariate interactions to detect effect modification.
Usage
cox_cs_fit(
df,
tte_name = "os_time",
event_name = "os_event",
treat_name = "treat",
strata_name = NULL,
z_name = "bm",
alpha = 0.2,
spline_df = 3,
z_max = Inf,
z_by = 1,
z_window = 0,
z_quantile = 0.9,
show_plot = TRUE,
plot_params = NULL,
truebeta_name = NULL,
verbose = TRUE
)
Arguments
df |
Data frame containing survival data |
tte_name |
Character string specifying time-to-event variable name. Default: "os_time" |
event_name |
Character string specifying event indicator variable name (1=event, 0=censored). Default: "os_event" |
treat_name |
Character string specifying treatment variable name (1=treated, 0=control). Default: "treat" |
strata_name |
Character string specifying stratification variable name. If NULL, no stratification is used. Default: NULL |
z_name |
Character string specifying continuous covariate name for effect modification. Default: "bm" |
alpha |
Numeric value for confidence level (two-sided). Default: 0.20 (80% confidence intervals) |
spline_df |
Integer specifying degrees of freedom for natural spline. Default: 3 |
z_max |
Numeric maximum value for z in predictions. Values beyond this are truncated. Default: Inf (no truncation) |
z_by |
Numeric increment for z values in prediction grid. Default: 1 |
z_window |
Numeric half-width for counting observations near each z value. Default: 0.0 (exact matches only) |
z_quantile |
Numeric quantile (0-1) for upper limit of z profile. Default: 0.90 (90th percentile) |
show_plot |
Logical indicating whether to display plot. Default: TRUE |
plot_params |
List of plotting parameters (see Details). Default: NULL |
truebeta_name |
Character string specifying variable containing true log(HR) values for validation/simulation. Default: NULL |
verbose |
Logical indicating whether to print diagnostic information. Default: TRUE |
Details
Model Structure
The function fits:
h(t|Z,A) = h_0(t) \exp(\beta_0 A + f(Z) + g(Z) \cdot A)
Where:
A is treatment (0/1)
Z is the continuous effect modifier
f(Z) is modeled with natural splines (main effect)
g(Z) is modeled with natural splines (interaction)
The log hazard ratio is:
\beta(Z) = \beta_0 + g(Z)
Plot Parameters
The plot_params argument accepts a list with:
-
xlab: x-axis label -
main_title: plot title -
ylimit: y-axis limits c(min, max) -
y_pad_zero: padding below zero line -
y_delta: extra space for count labels -
cex_legend: legend text size -
cex_count: count text size -
show_cox_primary: show standard Cox estimate line -
show_null: show null effect line (log(HR)=0) -
show_target: show target effect line (e.g., log(0.80))
Value
List containing:
- z_profile
Vector of z values where treatment effect is estimated
- loghr_est
Point estimates of log(HR) at each z value
- loghr_lower
Lower confidence bound
- loghr_upper
Upper confidence bound
- se_loghr
Standard errors of log(HR) estimates
- counts_profile
Number of observations near each z value
- cox_primary
Log(HR) from standard Cox model (no interaction)
- model_fit
The fitted coxph model object
- spline_basis
The natural spline basis object
Examples
# Simulate data
set.seed(123)
df <- data.frame(
os_time = rexp(500, 0.01),
os_event = rbinom(500, 1, 0.7),
treat = rbinom(500, 1, 0.5),
bm = rnorm(500, 50, 10)
)
# Fit model
result <- cox_cs_fit(df, z_name = "bm", alpha = 0.20)
# Custom plotting
result <- cox_cs_fit(
df,
z_name = "bm",
plot_params = list(
xlab = "Biomarker Level",
main_title = "Treatment Effect by Biomarker",
cex_legend = 1.2
)
)
Cox model summary for subgroup (OPTIMIZED)
Description
Called in analyze_subgroup() <– SG_tab_estimates
Usage
cox_summary(
Y,
E,
Treat,
Strata = NULL,
use_strata = !is.null(Strata),
return_format = c("formatted", "numeric")
)
Arguments
Y |
Numeric vector of outcome. |
E |
Numeric vector of event indicators. |
Treat |
Numeric vector of treatment indicators. |
Strata |
Vector of strata (optional). |
use_strata |
Logical. Whether to use strata in the model (default: TRUE if Strata provided). |
return_format |
Character. "formatted" (default) or "numeric" for downstream use. |
Details
Calculates hazard ratio and confidence interval for a subgroup using Cox regression. Optimized version with reduced overhead and better error handling.
Value
Character string with formatted HR and CI (or numeric vector if return_format="numeric").
Examples
library(survival)
cox_summary(
Y = gbsg$rfstime / 30.4375,
E = gbsg$status,
Treat = gbsg$hormon
)
Batch Cox summaries with caching
Description
For repeated calls with the same data structure but different subsets, this version pre-processes the data structure once.
Usage
cox_summary_batch(
Y,
E,
Treat,
Strata = NULL,
subset_indices,
return_format = c("formatted", "numeric")
)
Arguments
Y |
Numeric vector of outcome (full dataset). |
E |
Numeric vector of event indicators (full dataset). |
Treat |
Numeric vector of treatment indicators (full dataset). |
Strata |
Vector of strata (optional, full dataset). |
subset_indices |
List of integer vectors, each defining a subset to analyze. |
return_format |
Character. "formatted" or "numeric". |
Value
List of results, one per subset.
Cox model summary for subgroup - vectorized version
Description
Efficiently processes multiple subgroups at once. Useful when analyzing many subgroups (e.g., in cross-validation).
Usage
cox_summary_vectorized(
data,
outcome_col,
event_col,
treat_col,
strata_col = NULL,
subgroup_col = "subgroup",
return_format = c("formatted", "numeric")
)
Arguments
data |
Data frame with columns for Y, E, Treat, and optionally Strata. |
outcome_col |
Character. Name of outcome column. |
event_col |
Character. Name of event column. |
treat_col |
Character. Name of treatment column. |
strata_col |
Character. Name of strata column (optional). |
subgroup_col |
Character. Name of subgroup indicator column. |
return_format |
Character. "formatted" or "numeric". |
Value
Data frame with one row per subgroup and HR results.
Calculate Bootstrap Table Caption
Description
Generates an interpretive caption for bootstrap results table.
Usage
create_bootstrap_caption(est.scale, nb_boots, boot_success_rate)
Arguments
est.scale |
Character. "hr" or "1/hr" |
nb_boots |
Integer. Number of bootstrap iterations |
boot_success_rate |
Numeric. Proportion successful |
Value
Character string with caption
Create Bootstrap Diagnostic Plots
Description
Generates diagnostic visualization plots for bootstrap analysis.
Usage
create_bootstrap_diagnostic_plots(
results,
H_estimates,
Hc_estimates,
overall_timing = NULL
)
Arguments
results |
Data frame with bootstrap results |
H_estimates |
List with H subgroup estimates |
Hc_estimates |
List with Hc subgroup estimates |
overall_timing |
List with overall timing information (optional) |
Value
List of ggplot2 objects
Create Data Generating Mechanism for MRCT Simulations
Description
Wrapper function to create a data generating mechanism (DGM) for MRCT
simulation scenarios using generate_aft_dgm_flex.
Usage
create_dgm_for_mrct(
df_case,
model_type = c("alt", "null"),
log_hrs = NULL,
confounder_var = NULL,
confounder_effect = NULL,
include_regA = TRUE,
verbose = FALSE
)
Arguments
df_case |
Data frame containing case study data |
model_type |
Character. Either "alt" (alternative hypothesis with heterogeneous treatment effects) or "null" (uniform treatment effect) |
log_hrs |
Numeric vector. Log hazard ratios for spline specification. If NULL, defaults are used based on model_type |
confounder_var |
Character. Name of a confounder variable to include with a forced prognostic effect. Default: NULL (no forced effect) |
confounder_effect |
Numeric. Log hazard ratio for confounder_var effect. Only used if confounder_var is specified |
include_regA |
Logical. Include regA as a factor in the model. Default: TRUE |
verbose |
Logical. Print detailed output. Default: FALSE |
Details
Model Types
- alt
Alternative hypothesis: Treatment effect varies by biomarker level (heterogeneous treatment effect). Default log_hrs create HR ranging from 2.0 (harm) to 0.5 (benefit) across biomarker range
- null
Null hypothesis: Uniform treatment effect regardless of biomarker level. Default log_hrs = log(0.7) uniformly
Confounder Effects
By default, NO prognostic confounder effect is forced. The confounder_var and confounder_effect parameters allow optionally specifying ANY baseline covariate to have a fixed prognostic effect in the outcome model.
The regA variable (region indicator) is included as a factor by default but without a forced effect - its coefficient is estimated from data.
Value
An object of class "aft_dgm_flex" for use with
simulate_from_dgm and mrct_region_sims
See Also
generate_aft_dgm_flex for underlying DGM creation
mrct_region_sims for running simulations with the DGM
Create Factor Summary Tables from Bootstrap Results
Description
Generates formatted GT tables summarizing factor frequencies from bootstrap subgroup analysis. Creates two complementary tables: one showing factor selection frequencies within each position (M.1, M.2, etc.), and another showing overall factor frequencies across all positions.
Usage
create_factor_summary_tables(factor_freq, n_found, min_percent = 2)
Arguments
factor_freq |
Data.frame or data.table. Factor frequency table from
|
n_found |
Integer. Number of successful bootstrap iterations (where a subgroup was identified). Used to calculate overall percentages. |
min_percent |
Numeric. Minimum percentage threshold for including factors in the tables. Factors with selection frequencies below this threshold are excluded. Default is 2 (i.e., 2%). |
Value
A list with up to two GT table objects:
by_positionGT table showing factor frequencies within each position. Percentages represent conditional probability of factor selection given that the position was populated. Within each position, percentages sum to approximately 100% (may not sum exactly to 100% after filtering).
overallGT table showing total factor frequencies across all positions. Includes additional columns indicating which positions each factor appeared in and how many unique positions used the factor. Percentages represent proportion of successful iterations where the factor appeared in any position.
If no factors meet the minimum threshold, the corresponding table element will be NULL.
Note
This function requires the gt package for table creation. The overall table also requires dplyr for data aggregation. If dplyr is not available, only the position-specific table will be created and the overall element will be NULL.
Always check for NULL before using the returned tables:
if (!is.null(factor_tables$by_position)) {
print(factor_tables$by_position)
}
If all factors have percentages below min_percent, both table elements
will be NULL.
See Also
-
summarize_bootstrap_subgroupsfor generating the factor_freq input -
format_subgroup_summary_tablesfor creating all subgroup summary tables -
summarize_bootstrap_resultsfor complete bootstrap analysis workflow -
forestsearch_bootstrap_dofuturefor running bootstrap analysis
Create Forest Plot Theme with Size Controls
Description
Creates a forestploter theme with parameters that control overall plot sizing and appearance. This is the primary way to control how large the forest plot renders.
Usage
create_forest_theme(
base_size = 10,
scale = 1,
row_padding = NULL,
ci_pch = 15,
ci_lwd = NULL,
ci_Theight = NULL,
ci_col = "black",
header_fontsize = NULL,
body_fontsize = NULL,
footnote_fontsize = NULL,
footnote_col = "darkcyan",
title_fontsize = NULL,
cv_fontsize = NULL,
cv_col = "gray30",
refline_lwd = NULL,
refline_lty = "dashed",
refline_col = "gray30",
vertline_lwd = NULL,
vertline_lty = "dashed",
vertline_col = "gray20",
arrow_type = "closed",
arrow_col = "black",
summary_fill = "black",
summary_col = "black"
)
Arguments
base_size |
Numeric. Base font size in points. This is the primary scaling parameter - increasing it will proportionally scale all fonts, row padding, and line widths. Default: 10. |
scale |
Numeric. Additional scaling multiplier applied on top of base_size. Use for quick overall scaling. Default: 1.0. |
row_padding |
Numeric vector of length 2. Padding around row content in mm as c(vertical, horizontal). If NULL, auto-calculated from base_size. Default: NULL. |
ci_pch |
Integer. Point character for CI. 15=square, 16=circle, 18=diamond. Default: 15. |
ci_lwd |
Numeric. Line width for CI lines. If NULL, auto-calculated from base_size. Default: NULL. |
ci_Theight |
Numeric. Height of T-bar ends on CI. If NULL, auto-calculated from base_size. Default: NULL. |
ci_col |
Character. Color for CI lines and points. Default: "black". |
header_fontsize |
Numeric. Font size for column headers. If NULL, auto-calculated as base_size * scale + 1. Default: NULL. |
body_fontsize |
Numeric. Font size for body text. If NULL, auto-calculated as base_size * scale. Default: NULL. |
footnote_fontsize |
Numeric. Font size for footnotes. If NULL, auto-calculated as base_size * scale - 1. Default: NULL. |
footnote_col |
Character. Color for footnote text. Default: "darkcyan". |
title_fontsize |
Numeric. Font size for title. If NULL, auto-calculated as base_size * scale + 4. Default: NULL. |
cv_fontsize |
Numeric. Font size for CV annotation text. If NULL, auto-calculated as base_size * scale. Default: NULL. |
cv_col |
Character. Color for CV annotation text. Default: "gray30". |
refline_lwd |
Numeric. Reference line width. If NULL, auto-calculated. Default: NULL. |
refline_lty |
Character. Reference line type. Default: "dashed". |
refline_col |
Character. Reference line color. Default: "gray30". |
vertline_lwd |
Numeric. Vertical line width. If NULL, auto-calculated. Default: NULL. |
vertline_lty |
Character. Vertical line type. Default: "dashed". |
vertline_col |
Character. Vertical line color. Default: "gray20". |
arrow_type |
Character. Arrow type: "open" or "closed". Default: "closed". |
arrow_col |
Character. Arrow color. Default: "black". |
summary_fill |
Character. Fill color for summary diamonds. Default: "black". |
summary_col |
Character. Border color for summary diamonds. Default: "black". |
Details
The base_size parameter is the primary way to control plot size.
When you change base_size, the following are automatically scaled:
All font sizes (body, header, footnote, CV, title)
Row padding (vertical and horizontal)
CI line width and T-bar height
Reference and vertical line widths
The scaling formula uses base_size = 10 as the reference point:
base_size = 10: Default sizing
base_size = 12: 20% larger
base_size = 14: 40% larger
base_size = 16: 60% larger
You can override any individual parameter by specifying it explicitly.
The theme does NOT set row background colors - those are determined
automatically by plot_subgroup_results_forestplot() based on
row types (ITT, reference, posthoc, etc.).
Value
A list of class "fs_forest_theme" containing all theme parameters.
See Also
plot_subgroup_results_forestplot, render_forestplot
Examples
# Simple: just increase base_size for larger plot
large_theme <- create_forest_theme(base_size = 14)
print(large_theme)
# Or use scale for quick adjustment
large_theme <- create_forest_theme(base_size = 10, scale = 1.4)
# Fine-tune specific elements
custom_theme <- create_forest_theme(
base_size = 14,
cv_fontsize = 12,
ci_lwd = 2.5
)
Create Subgroup Indicator Columns from ForestSearch
Description
Internal helper to create Qrecommend and Brecommend indicator columns.
Usage
create_fs_subgroup_indicators(
df,
fs.est,
col_names = c("Qrecommend", "Brecommend"),
verbose = FALSE
)
Arguments
df |
Data frame to modify. |
fs.est |
A forestsearch object. |
col_names |
Character vector of length 2. Names for the indicator columns: first for harm/questionable (treat.recommend == 0), second for benefit/recommend (treat.recommend == 1). Default: c("Qrecommend", "Brecommend") |
verbose |
Logical. Print diagnostic messages. |
Value
Modified data frame with indicator columns.
Create GBSG-Based AFT Data Generating Mechanism
Description
Creates a data generating mechanism (DGM) for survival simulations based on the German Breast Cancer Study Group (GBSG) dataset. Supports heterogeneous treatment effects via treatment-subgroup interactions.
Usage
create_gbsg_dgm(
model = c("alt", "null"),
k_treat = 1,
k_inter = 1,
k_z3 = 1,
z1_quantile = 0.25,
n_super = DEFAULT_N_SUPER,
cens_type = c("weibull", "uniform"),
use_rand_params = FALSE,
seed = SEED_BASE,
verbose = FALSE
)
Arguments
model |
Character. Either "alt" for alternative hypothesis with heterogeneous treatment effects, or "null" for uniform treatment effect. Default: "alt" |
k_treat |
Numeric. Treatment effect multiplier applied to the treatment coefficient from the fitted AFT model. Values > 1 strengthen the treatment effect. Default: 1 |
k_inter |
Numeric. Interaction effect multiplier for the treatment-subgroup interaction (z1 * z3). Only used when model = "alt". Higher values create more heterogeneity between HR(H) and HR(Hc). Default: 1 |
k_z3 |
Numeric. Effect multiplier for the z3 (menopausal status) coefficient. Default: 1 |
z1_quantile |
Numeric. Quantile threshold for z1 (estrogen receptor). Observations with ER <= quantile are coded as z1 = 1. Default: 0.25 |
n_super |
Integer. Size of super-population for empirical HR estimation. Default: 5000 |
cens_type |
Character. Censoring distribution type: "weibull" or "uniform". Default: "weibull" |
use_rand_params |
Logical. If TRUE, modifies confounder coefficients using estimates from randomized subset (meno == 0). Default: FALSE |
seed |
Integer. Random seed for super-population generation. Default: 8316951 |
verbose |
Logical. Print diagnostic information. Default: FALSE |
Details
This version is aligned with generate_aft_dgm_flex() and
calculate_hazard_ratios() methodology, computing individual-level
potential outcomes and average hazard ratios (AHR).
Subgroup Definition
The harm subgroup H is defined as: z1 = 1 AND z3 = 1, where:
z1: Low estrogen receptor (ER <= 25th percentile by default)
z3: Premenopausal status (meno == 0)
Model Specification
The AFT model uses covariates: treat, z1, z2, z3, z4, z5, and (for "alt") the interaction zh = treat * z1 * z3.
Interaction Effect (k_inter)
The k_inter parameter modifies the zh coefficient in the AFT model:
gamma[zh] <- k_inter * gamma[zh]
This affects the hazard ratio for the harm subgroup:
HR(H) = exp(-gamma[treat]/sigma - gamma[zh]/sigma)
HR(Hc) = exp(-gamma[treat]/sigma)
When k_inter = 0, HR(H) = HR(Hc) (no heterogeneity).
Alignment with generate_aft_dgm_flex
This function now computes:
theta_0: Log-hazard contribution under control
theta_1: Log-hazard contribution under treatment
loghr_po: Individual causal log hazard ratio (theta_1 - theta_0)
AHR metrics: exp(mean(loghr_po)) for overall and subgroups
Value
A list of class "gbsg_dgm" containing:
- df_super_rand
Data frame with randomized super-population including potential outcomes (theta_0, theta_1, loghr_po)
- hr_H_true
Empirical hazard ratio in harm subgroup (Cox-based)
- hr_Hc_true
Empirical hazard ratio in complement subgroup (Cox-based)
- hr_causal
Overall causal (ITT) hazard ratio (Cox-based)
- AHR
Overall average hazard ratio (from loghr_po)
- AHR_H_true
Average hazard ratio in harm subgroup
- AHR_Hc_true
Average hazard ratio in complement subgroup
- hazard_ratios
List matching generate_aft_dgm_flex output format
- model_params
List with AFT model parameters (mu, sigma, gamma, etc.)
- cens_params
List with censoring model parameters
- subgroup_info
List with subgroup definitions and true factor names
- analysis_vars
Character vector of analysis variable names
- model_type
Character indicating "alt" or "null"
See Also
simulate_from_gbsg_dgm for generating data from the DGM
calibrate_k_inter for finding k_inter to achieve target HR
Helper Functions for GRF Subgroup Analysis
Description
This file contains helper functions used by grf.subg.harm.survival() to improve readability and modularity. Create GRF configuration object
Usage
create_grf_config(
frac.tau,
n.min,
dmin.grf,
RCT,
sg.criterion,
maxdepth,
seedit
)
Arguments
frac.tau |
Numeric. Fraction of tau for GRF horizon |
n.min |
Integer. Minimum subgroup size |
dmin.grf |
Numeric. Minimum difference in subgroup mean |
RCT |
Logical. Is the data from a randomized controlled trial? |
sg.criterion |
Character. Subgroup selection criterion |
maxdepth |
Integer. Maximum tree depth |
seedit |
Integer. Random seed |
Details
Creates a configuration object to organize GRF parameters
Value
List with configuration parameters
Create result object when no subgroup is found
Description
Builds result object for cases where no valid subgroup is identified
Usage
create_null_result(data, values, trees, config)
Arguments
data |
Data frame. Original data |
values |
Data frame. Node metrics (may be empty) |
trees |
List. Fitted policy trees |
config |
List. GRF configuration |
Value
List with limited GRF results
Create Reference Subgroup Indicator Columns
Description
Creates indicator columns (0/1) in the data frame for each reference subgroup based on the provided subset expressions.
Usage
create_reference_subgroup_columns(df, ref_subgroups, verbose = FALSE)
Arguments
df |
Data frame to modify. |
ref_subgroups |
Named list of reference subgroup definitions.
Each element should have |
verbose |
Logical. Print diagnostic messages. |
Value
List with modified df, cols, labels, and colors vectors.
Create Result Row
Description
Create Result Row
Usage
create_result_row(kk, covs.in, nx, event_counts, cox_result)
Create Sample Size Table for Multiple Scenarios
Description
Generates a table of required sample sizes for different combinations of true hazard ratios and censoring proportions.
Usage
create_sample_size_table(
theta_values,
prop_cens_values,
target_power = 0.8,
hr_threshold = 1.25,
verbose = TRUE
)
Arguments
theta_values |
Numeric vector. True hazard ratios to evaluate. |
prop_cens_values |
Numeric vector. Censoring proportions to evaluate. |
target_power |
Numeric. Target detection probability. Default: 0.80 |
hr_threshold |
Numeric. HR threshold. Default: 1.25 |
verbose |
Logical. Print progress. Default: TRUE |
Value
A data.frame with columns: theta, prop_cens, n_required, achieved_power
Create Spline Variables
Description
Create Spline Variables
Usage
create_spline_variables(df_work, spline_var, knot)
Create Subgroup Indicator from Factor Definitions
Description
Parses factor definitions (e.g., "v1.1", "grade3.1") and creates a binary indicator for subgroup membership.
Usage
create_subgroup_indicator(df, sg_factors)
Arguments
df |
Data frame containing the variables |
sg_factors |
Character vector of factor definitions |
Value
Integer vector (1 = in subgroup, 0 = not in subgroup)
Create Subgroup Summary Data Frame for Forest Plot
Description
Creates a data frame suitable for forestploter from multiple subgroup analyses. This is a more flexible alternative for complex subgroup configurations.
Usage
create_subgroup_summary_df(
df_analysis,
subgroups,
outcome.name,
event.name,
treat.name,
E.name = "E",
C.name = "C",
fs_bc_list = NULL,
fs_kfold_list = NULL,
conf.level = 0.95
)
Arguments
df_analysis |
Data frame. The analysis dataset. |
subgroups |
Named list of subgroup definitions. |
outcome.name |
Character. Name of survival time variable. |
event.name |
Character. Name of event indicator variable. |
treat.name |
Character. Name of treatment variable. |
E.name |
Character. Label for experimental arm. |
C.name |
Character. Label for control arm. |
fs_bc_list |
List. Named list of bootstrap results for each subgroup. |
fs_kfold_list |
List. Named list of k-fold results for each subgroup. |
conf.level |
Numeric. Confidence level for intervals (default: 0.95). |
Value
Data frame with HR estimates for all subgroups.
Create result object for successful subgroup identification
Description
Builds comprehensive result object when a subgroup is found
Usage
create_success_result(
data,
best_subgroup,
trees,
tree_cuts,
selected_tree,
sg_harm_id,
values,
config
)
Arguments
data |
Data frame. Original data with subgroup assignments |
best_subgroup |
Data frame row. Selected subgroup information |
trees |
List. All fitted policy trees |
tree_cuts |
List. Cut information from trees |
selected_tree |
Policy tree. The tree that identified the subgroup |
sg_harm_id |
Character. Expression defining the subgroup |
values |
Data frame. All node metrics |
config |
List. GRF configuration |
Value
List with complete GRF results
Create Enhanced Summary Table for Baseline Characteristics
Description
Generates a formatted summary table comparing baseline characteristics between treatment arms. Supports continuous, categorical, and binary variables with p-values, standardized mean differences (SMD), and missing data summaries.
Usage
create_summary_table(
data,
treat_var = "treat",
vars_continuous = NULL,
vars_categorical = NULL,
vars_binary = NULL,
var_labels = NULL,
digits = 1,
show_pvalue = TRUE,
show_smd = TRUE,
show_missing = TRUE,
table_title = "Baseline Characteristics by Treatment Arm",
table_subtitle = NULL,
source_note = NULL,
font_size = 12,
header_font_size = 14,
footnote_font_size = 10,
use_alternating_rows = TRUE,
stripe_color = "#f9f9f9",
indent_size = 20,
highlight_pval = 0.05,
highlight_smd = 0.2,
highlight_color = "#fff3cd",
compact_mode = FALSE,
column_width_var = 200,
column_width_stats = 120,
show_column_borders = FALSE,
custom_css = NULL
)
Arguments
data |
Data frame containing the analysis data |
treat_var |
Character. Name of treatment variable (must have 2 levels) |
vars_continuous |
Character vector. Names of continuous variables |
vars_categorical |
Character vector. Names of categorical variables |
vars_binary |
Character vector. Names of binary (0/1) variables |
var_labels |
Named list. Custom labels for variables (optional) |
digits |
Integer. Number of decimal places for continuous variables |
show_pvalue |
Logical. Include p-values column |
show_smd |
Logical. Include SMD (effect size) column |
show_missing |
Logical. Include missing data rows |
table_title |
Character. Main title for the table |
table_subtitle |
Character. Subtitle for the table (optional) |
source_note |
Character. Source note at bottom (optional) |
font_size |
Numeric. Base font size in pixels (default: 12) |
header_font_size |
Numeric. Header font size in pixels (default: 14) |
footnote_font_size |
Numeric. Footnote font size in pixels (default: 10) |
use_alternating_rows |
Logical. Apply zebra striping (default: TRUE) |
stripe_color |
Character. Color for alternating rows (default: "#f9f9f9") |
indent_size |
Numeric. Indentation for sub-levels in pixels (default: 20) |
highlight_pval |
Numeric. Highlight p-values below this threshold (default: 0.05) |
highlight_smd |
Numeric. Highlight SMD values above this threshold (default: 0.2) |
highlight_color |
Character. Color for highlighting (default: "#fff3cd") |
compact_mode |
Logical. Reduce spacing for compact display (default: FALSE) |
column_width_var |
Numeric. Width for Variable column in pixels (default: 200) |
column_width_stats |
Numeric. Width for stat columns in pixels (default: 120) |
show_column_borders |
Logical. Show vertical column borders (default: FALSE) |
custom_css |
Character. Additional custom CSS styling (optional) |
Details
Binary variables specified via vars_binary display a single row
showing the count and proportion for the "1" level. Categorical variables
specified via vars_categorical that happen to be binary-coded (i.e.,
have exactly two levels: 0 and 1) are automatically detected and displayed
in the same compact single-row format, showing only the "1" proportion.
Value
A gt table object (or data frame if gt not available)
Preset: Compact Table
Description
Preset: Compact Table
Usage
create_summary_table_compact(...)
Arguments
... |
Arguments passed to create_summary_table() |
Preset: Minimal Table (No Highlighting, No Alternating)
Description
Preset: Minimal Table (No Highlighting, No Alternating)
Usage
create_summary_table_minimal(...)
Arguments
... |
Arguments passed to create_summary_table() |
Preset: Presentation Table (Large Fonts)
Description
Preset: Presentation Table (Large Fonts)
Usage
create_summary_table_presentation(...)
Arguments
... |
Arguments passed to create_summary_table() |
Preset: Publication-Ready Table
Description
Preset: Publication-Ready Table
Usage
create_summary_table_publication(...)
Arguments
... |
Arguments passed to create_summary_table() |
Create Timing Summary Table
Description
Creates a data frame summarizing bootstrap timing information.
Usage
create_timing_summary_table(
overall_timing,
iteration_stats,
fs_stats,
overhead_stats,
nb_boots,
boot_success_rate
)
Arguments
overall_timing |
List. Overall timing statistics |
iteration_stats |
List. Per-iteration timing statistics |
fs_stats |
List. ForestSearch-specific timing statistics |
overhead_stats |
List. Overhead timing statistics |
nb_boots |
Integer. Number of bootstrap iterations |
boot_success_rate |
Numeric. Proportion of successful bootstraps |
Value
Data frame with timing summary
Discretize Continuous Variable into Quantile-Based Categories
Description
Discretize Continuous Variable into Quantile-Based Categories
Usage
cut_numeric(x, probs = c(0.25, 0.5, 0.75))
Arguments
x |
Numeric vector to discretize |
probs |
Numeric vector of probabilities for quantile breaks. Default: c(0.25, 0.5, 0.75) creates quartiles coded as 1, 2, 3, 4 |
Value
Integer vector with category codes (1 = lowest, max = highest)
Discretize Continuous Variable by Size Categories
Description
Discretize Continuous Variable by Size Categories
Usage
cut_size(x, breaks = c(20, 50))
Arguments
x |
Numeric vector (typically tumor size) |
breaks |
Numeric vector of breakpoints. Default: c(20, 50) |
Value
Integer vector with category codes
Generate cut expressions for a variable
Description
For a continuous variable, returns expressions for mean, median, qlow, and qhigh cuts.
Usage
cut_var(x)
Arguments
x |
Character. Variable name. |
Value
Character vector of cut expressions.
Compare Multiple CV Results
Description
Creates a comparison table from multiple cross-validation runs with different configurations.
Usage
cv_compare_results(
cv_list,
metrics = c("all", "finding", "agreement"),
show_percentages = TRUE,
digits = 1,
use_gt = TRUE
)
Arguments
cv_list |
Named list of cv_result objects from |
metrics |
Character vector. Which metrics to include. Options: "finding", "agreement", "all". Default: "all". |
show_percentages |
Logical. Display as percentages. Default: TRUE. |
digits |
Integer. Decimal places. Default: 1. |
use_gt |
Logical. Return gt table if TRUE. Default: TRUE. |
Value
A gt table or data.frame comparing CV results across configurations.
Create Metrics Tables for Cross-Validation Results
Description
Formats the find_summary and sens_summary outputs from
forestsearch_tenfold or forestsearch_Kfold
into publication-ready gt tables.
Usage
cv_metrics_tables(
cv_result,
sg_definition = NULL,
title = "Cross-Validation Metrics",
show_percentages = TRUE,
digits = 1,
include_raw = FALSE,
table_style = c("combined", "separate", "minimal"),
use_gt = TRUE
)
Arguments
cv_result |
List. Result from |
sg_definition |
Character vector. Subgroup factor definitions for
labeling (optional). If NULL, extracted from |
title |
Character. Main title for combined table. Default: "Cross-Validation Metrics". |
show_percentages |
Logical. Display metrics as percentages (0-100) instead of proportions (0-1). Default: TRUE. |
digits |
Integer. Decimal places for formatting. Default: 1. |
include_raw |
Logical. Include raw matrices ( |
table_style |
Character. One of "combined", "separate", or "minimal".
Default: "combined". |
use_gt |
Logical. Return gt table(s) if TRUE, data.frame(s) if FALSE. Default: TRUE. |
Value
Depending on table_style:
"combined": A single gt table (or data.frame)
"separate": A list with
agreement_tableandfinding_table"minimal": A single-row gt table (or data.frame)
If include_raw = TRUE, also includes sens_out and find_out
matrices in the returned list.
See Also
cv_summary_tables for formatting forestsearch_KfoldOut(outall=TRUE) results
Create Summary Tables from forestsearch_KfoldOut Results
Description
Formats the detailed output from forestsearch_KfoldOut(outall=TRUE)
into publication-ready gt tables. This includes ITT estimates, original subgroup
estimates, and K-fold subgroup estimates.
Usage
cv_summary_tables(
kfold_out,
title = "Cross-Validation Summary",
subtitle = NULL,
show_metrics = TRUE,
digits = 3,
font_size = 12,
use_gt = TRUE
)
Arguments
kfold_out |
List. Result from |
title |
Character. Main title for combined table. Default: "Cross-Validation Summary". |
subtitle |
Character. Subtitle for table. Default: NULL (auto-generated). |
show_metrics |
Logical. Include agreement and finding metrics in output. Default: TRUE. |
digits |
Integer. Decimal places for numeric formatting. Default: 3. |
font_size |
Integer. Font size in pixels. Default: 12. |
use_gt |
Logical. Return gt table if TRUE, data.frame if FALSE. Default: TRUE. |
Value
If use_gt = TRUE, returns a list with gt table objects:
-
combined_table: Combined ITT and subgroup estimates -
itt_table: ITT estimates only -
original_table: Original full-data subgroup estimates -
kfold_table: K-fold subgroup estimates -
metrics_table: Agreement and finding metrics (ifshow_metrics = TRUE)
If use_gt = FALSE, returns equivalent data.frames.
See Also
cv_metrics_tables for formatting forestsearch_tenfold() results
Create Compact CV Summary Text
Description
Generates a compact text string summarizing CV results, suitable for annotations in plots or reports.
Usage
cv_summary_text(
cv_result,
est.scale = "hr",
include_finding = TRUE,
include_agreement = TRUE
)
Arguments
cv_result |
List. Result from |
est.scale |
Character. "hr" or "1/hr" to determine label orientation. Default: "hr". |
include_finding |
Logical. Include subgroup finding rate. Default: TRUE. |
include_agreement |
Logical |
Value
Character string with formatted CV metrics.
Default ForestSearch Parameters for GBSG Simulations
Description
Returns a list of default parameters for ForestSearch analysis in GBSG-based simulations.
Usage
default_fs_params()
Details
Default parameters are optimized for GBSG simulation scenarios with moderate sample sizes (300-1000) and typical event rates.
Variable selection defaults:
use_lasso = TRUE: LASSO-based variable importance (default for FS)
use_grf = FALSE: GRF-based variable importance (enable for FSlg)
The use_twostage parameter is set to FALSE by default for backward
compatibility. Set to TRUE for faster exploratory analyses.
Value
List of default ForestSearch parameters
Default GRF Parameters for GBSG Simulations
Description
Returns a list of default parameters for GRF analysis
in GBSG-based simulations. Parameters align with
grf.subg.harm.survival() function signature.
Usage
default_grf_params()
Value
List of default GRF parameters
Default GRF parameters (general)
Description
Default GRF parameters (general)
Usage
default_grf_params_gen()
Default ForestSearch parameters (general)
Description
Default ForestSearch parameters (general)
Usage
default_sim_params()
Define Subgroups with Flexible Cutpoints
Description
Define Subgroups with Flexible Cutpoints
Usage
define_subgroups(
df_work,
data,
subgroup_vars,
subgroup_cuts,
continuous_vars,
model,
verbose
)
Bivariate Density for Split-Sample HR Threshold Detection
Description
Computes the joint density for the two-split detection criterion where both split-halves must exceed individual thresholds and their average must exceed a consistency threshold.
Usage
density_threshold_both(x, theta, prop_cens = 0.3, n_sg, k_avg, k_ind)
Arguments
x |
Numeric vector of length 2. Log hazard ratio estimates from the two split-halves. |
theta |
Numeric. True hazard ratio in the subgroup. |
prop_cens |
Numeric. Proportion censored (0-1). Default: 0.3 |
n_sg |
Integer. Subgroup sample size. |
k_avg |
Numeric. Threshold for average log(HR) across splits. Typically log(hr.threshold). |
k_ind |
Numeric. Threshold for individual split log(HR). Typically log(hr.consistency). |
Details
The detection criterion requires:
Average of two splits: (x1 + x2)/2 >= k_avg
Individual splits: x1 >= k_ind AND x2 >= k_ind
Under the asymptotic approximation, each split-half log(HR) estimator follows N(log(theta), 8/d) where d = n_sg * (1 - prop_cens) / 2 is the expected number of events per split.
Value
Numeric. Joint density value at x, or 0 if thresholds not met.
Vectorized Density for Integration
Description
Wrapper around density_threshold_both for use with cubature integration.
Usage
density_threshold_integrand(x, theta, prop_cens, n_sg, k_avg, k_ind)
Arguments
x |
Numeric vector of length 2. Log hazard ratio estimates from the two split-halves. |
theta |
Numeric. True hazard ratio in the subgroup. |
prop_cens |
Numeric. Proportion censored (0-1). Default: 0.3 |
n_sg |
Integer. Subgroup sample size. |
k_avg |
Numeric. Threshold for average log(HR) across splits. Typically log(hr.threshold). |
k_ind |
Numeric. Threshold for individual split log(HR). Typically log(hr.consistency). |
Value
Numeric density value.
Automatically Detect Variable Types in a Dataset
Description
Analyzes a data frame to automatically classify variables as continuous or categorical, and returns a subset of the data with specified variables excluded.
Usage
detect_variable_types(data, max_unique_for_cat = 10, exclude_vars = NULL)
Arguments
data |
A data frame to analyze |
max_unique_for_cat |
Integer. Maximum number of unique values for a numeric variable to be considered categorical. Default is 10. |
exclude_vars |
Character vector of variable names to exclude from both classification and the returned dataset (e.g., ID variables, timestamps). Default is NULL. |
Details
The function classifies variables using the following rules:
Numeric variables with more than
max_unique_for_catunique values are classified as continuousNumeric variables with
max_unique_for_cator fewer unique values are classified as categoricalFactor, character, and logical variables are always classified as categorical
Variables listed in
exclude_varsare omitted from classification and removed from the returned dataset
Value
A list containing:
continuous_vars |
Character vector of variable names classified as continuous |
cat_vars |
Character vector of variable names classified as categorical |
data_subset |
Data frame with exclude_vars columns removed |
Dummy-code a data frame (numeric pass-through, factors expanded)
Description
Dummy-code a data frame (numeric pass-through, factors expanded)
Usage
dummy_encode(df)
Arguments
df |
Data frame with numeric and/or factor columns. |
Value
Data frame with numeric columns unchanged and factor columns
expanded via acm.disjctif.
Early Stopping Decision
Description
Evaluates whether enough evidence exists to stop early based on confidence interval for consistency proportion.
Usage
early_stop_decision(
n_success,
n_total,
threshold,
conf.level = 0.95,
min_samples = 20
)
Arguments
n_success |
Integer. Number of splits meeting consistency. |
n_total |
Integer. Total number of valid splits. |
threshold |
Numeric. Target consistency threshold. |
conf.level |
Numeric. Confidence level for decision (default 0.95). |
min_samples |
Integer. Minimum samples before allowing early stop. |
Value
Character. One of "continue", "pass", or "fail".
Evaluate a Single Factor Combination with Status Tracking
Description
Tests whether a specific combination meets all criteria and returns a status code indicating how far the evaluation progressed.
Usage
evaluate_combination_with_status(
covs.in,
yy,
dd,
tt,
zz,
n.min,
d0.min,
d1.min,
hr.threshold,
minp,
rmin,
kk
)
Arguments
covs.in |
Numeric vector. Factor selection indicators. |
yy |
Numeric vector. Outcome values. |
dd |
Numeric vector. Event indicators. |
tt |
Numeric vector. Treatment indicators. |
zz |
Matrix. Factor indicators. |
n.min |
Integer. Minimum sample size. |
d0.min |
Integer. Minimum control events. |
d1.min |
Integer. Minimum treatment events. |
hr.threshold |
Numeric. HR threshold. |
minp |
Numeric. Minimum prevalence. |
rmin |
Integer. Minimum size reduction. |
kk |
Integer. Combination index. |
Value
List with:
- status
Integer status code: 0 = failed variance check, 1 = passed variance, failed prevalence, 2 = passed prevalence, failed redundancy, 3 = passed redundancy, failed events, 4 = passed events, failed sample size, 5 = passed sample size, failed Cox fit, 6 = passed Cox fit, failed HR threshold, 7 = passed all criteria (success)
- result
Result row if successful, NULL otherwise
Evaluate a Comparison Expression Without eval(parse())
Description
Parses a string of the form "var op value" and evaluates it
directly against a data frame column using operator dispatch. Falls back
to column-name lookup for bare names.
Usage
evaluate_comparison(expr, df)
Arguments
expr |
Character. An expression like |
df |
Data frame whose columns are referenced by |
Details
Supported operators (matched longest-first to avoid partial-match
ambiguity): <=, >=, !=, ==, <,
>.
If no operator is found, expr is treated as a column name and
the result is df[[expr]] == 1.
The value on the right-hand side is coerced to numeric when possible, otherwise kept as character for string comparisons.
Value
Logical vector of length nrow(df).
Evaluate Consistency (Two-Stage Algorithm)
Description
Evaluates a single subgroup for consistency using a two-stage approach: Stage 1 screens with fewer splits, Stage 2 uses sequential batched evaluation with early stopping for efficient evaluation.
Usage
evaluate_consistency_twostage(
m,
index.Z,
names.Z,
df,
found.hrs,
hr.consistency,
pconsistency.threshold,
pconsistency.digits = 2,
maxk,
confs_labels,
details = FALSE,
n.splits.screen = 30,
screen.threshold = NULL,
n.splits.max = 400,
batch.size = 20,
conf.level = 0.95,
min.valid.screen = 10
)
Arguments
m |
Integer. Index of subgroup to evaluate. |
index.Z |
data.table or matrix. Factor indicators for all subgroups. |
names.Z |
Character vector. Names of factor columns. |
df |
data.frame. Original data with Y, Event, Treat, id columns. |
found.hrs |
data.table. Subgroup hazard ratio results. |
hr.consistency |
Numeric. Minimum HR threshold for consistency. |
pconsistency.threshold |
Numeric. Final consistency threshold. |
pconsistency.digits |
Integer. Rounding digits for output. |
maxk |
Integer. Maximum number of factors in a subgroup. |
confs_labels |
Character vector. Labels for confounders. |
details |
Logical. Print progress details. |
n.splits.screen |
Integer. Number of splits for Stage 1 (default 30). |
screen.threshold |
Numeric. Screening threshold for Stage 1 (default auto-calculated). |
n.splits.max |
Integer. Maximum total splits (default 400). |
batch.size |
Integer. Splits per batch in Stage 2 (default 20). |
conf.level |
Numeric. Confidence level for early stopping (default 0.95). |
min.valid.screen |
Integer. Minimum valid splits in Stage 1 (default 10). |
Value
Named numeric vector with consistency results, or NULL if not met.
Cache and validate cut expressions efficiently
Description
Evaluates all cut expressions once and caches results to avoid redundant evaluation. Much faster than evaluating repeatedly.
Usage
evaluate_cuts_once(confs, df, details = FALSE)
Arguments
confs |
Character vector of cut expressions. |
df |
Data frame to evaluate expressions against. |
details |
Logical. Print details during execution. |
Details
This replaces multiple eval(parse()) calls scattered throughout get_FSdata. By caching results, we avoid:
Repeated parsing of expressions
Repeated evaluation on dataframe
Redundant uniqueness checks
Value
List with:
evaluations: List of evaluated vectors (logical TRUE/FALSE) for each cut
is_valid: Logical vector indicating which cuts produced >1 unique value
has_error: Logical vector indicating which cuts failed to evaluate
Evaluate Single Subgroup for Consistency (Fixed-Sample)
Description
Evaluates a single subgroup for consistency across random splits using a fixed number of splits.
Usage
evaluate_subgroup_consistency(
m,
index.Z,
names.Z,
df,
found.hrs,
n.splits,
hr.consistency,
pconsistency.threshold,
pconsistency.digits = 2,
maxk,
confs_labels,
details = FALSE
)
Arguments
m |
Integer. Index of the subgroup to evaluate. |
index.Z |
Data.table or matrix. Factor indicators for all subgroups. |
names.Z |
Character vector. Names of factor columns. |
df |
Data.frame. Original data with Y, Event, Treat, id columns. |
found.hrs |
Data.table. Subgroup hazard ratio results. |
n.splits |
Integer. Number of random splits for consistency evaluation. |
hr.consistency |
Numeric. Minimum HR threshold for consistency. |
pconsistency.threshold |
Numeric. Minimum proportion of splits meeting consistency. |
pconsistency.digits |
Integer. Rounding digits for consistency proportion. |
maxk |
Integer. Maximum number of factors in a subgroup. |
confs_labels |
Character vector. Labels for confounders. |
details |
Logical. Print details during execution. |
Value
Named numeric vector with consistency results, or NULL if criteria not met.
Extract all cuts from fitted trees
Description
Consolidates cut information from all fitted policy trees. This is the default behavior that returns cuts from all trees regardless of which tree identified the selected subgroup.
Usage
extract_all_tree_cuts(trees, maxdepth)
Arguments
trees |
List. Policy trees (indexed by depth) |
maxdepth |
Integer. Maximum tree depth |
Value
List with cuts and names for each tree and combined
Extract Estimates from ForestSearch Results
Description
Extracts operating characteristics (HRs, classification metrics, etc.) from ForestSearch analysis results. Aligned with new DGM output structure.
Usage
extract_fs_estimates(
df,
fs_res,
dgm,
cox_formula = NULL,
cox_formula_adj = NULL,
analysis = "FS",
fs_full = NULL,
verbose = FALSE
)
Arguments
df |
Simulated data frame |
fs_res |
ForestSearch result table (grp.consistency$out_sg$result, or NULL) |
dgm |
DGM object containing true HRs (supports both old and new formats) |
cox_formula |
Cox formula for estimation (optional) |
cox_formula_adj |
Adjusted Cox formula (optional) |
analysis |
Analysis label (e.g., "FS", "FSlg") |
fs_full |
Full forestsearch result object (for df.est access) |
verbose |
Logical. Print extraction details. Default: FALSE |
Value
data.table with extracted estimates including AHR metrics
Extract Subgroup Definition from ForestSearch Object
Description
Internal helper to extract human-readable subgroup definition.
Usage
extract_fs_subgroup_definition(fs.est, verbose = FALSE)
Arguments
fs.est |
A forestsearch object. |
verbose |
Logical. Print diagnostic messages. |
Value
Character string describing the subgroup definition.
Extract Estimates from GRF Results
Description
Aligned with new DGM output structure including AHR metrics. Correctly handles grf.subg.harm.survival() output structure:
sg.harm.id: subgroup definition string
data: data frame with treat.recommend column (0 = harm, 1 = complement)
Usage
extract_grf_estimates(
df,
grf_est,
dgm,
cox_formula = NULL,
cox_formula_adj = NULL,
analysis = "GRF",
frac_tau = 1,
verbose = FALSE,
debug = FALSE
)
Arguments
df |
Simulated data frame |
grf_est |
GRF estimation result from grf.subg.harm.survival() |
dgm |
DGM object |
cox_formula |
Cox formula |
cox_formula_adj |
Adjusted Cox formula |
analysis |
Analysis label |
frac_tau |
Fraction of tau used |
verbose |
Print extraction details |
debug |
Print detailed debugging information about GRF result structure |
Value
data.table with extracted estimates
Extract redundancy flag for subgroup combinations
Description
Checks if adding each factor to a subgroup reduces the sample size by at least rmin.
Usage
extract_idx_flagredundancy(x, rmin)
Arguments
x |
Matrix of subgroup factor indicators. |
rmin |
Integer. Minimum required reduction in sample size. |
Value
List with id.x (membership vector) and flag.redundant (logical).
Extract cuts from selected tree only
Description
Extracts cut information only from the tree at the specified selected depth.
This provides a focused set of cuts from the tree that identified the
subgroup meeting the dmin.grf criterion, rather than cuts from all trees.
Usage
extract_selected_tree_cuts(trees, selected_depth, maxdepth)
Arguments
trees |
List. Policy trees (indexed by depth) |
selected_depth |
Integer. Depth of the selected tree (from best_subgroup$depth) |
maxdepth |
Integer. Maximum tree depth (for populating tree-specific slots) |
Details
This function is used when return_selected_cuts_only = TRUE in
grf.subg.harm.survival(). It returns:
-
tree1,tree2,tree3: Individual tree cuts (still populated for reference) -
names1,names2,names3: Individual tree variable names -
all: Cuts from the SELECTED tree only (not union of all trees) -
all_names: Variable names from the SELECTED tree only -
selected_depth: The depth that was selected
Value
List with cuts and names, structured similarly to extract_all_tree_cuts
but with only the selected tree's cuts in the all field
Extract Subgroup Information
Description
Extracts subgroup definition and membership from results.
Usage
extract_subgroup(df, top_result, index.Z, names.Z, confs_labels)
Arguments
df |
Data.frame. Original analysis data. |
top_result |
Data.table row. Top subgroup result. |
index.Z |
Matrix. Factor indicators for all subgroups. |
names.Z |
Character vector. Factor column names. |
confs_labels |
Character vector. Human-readable labels. |
Value
List with sg.harm, sg.harm_label, df_flag, sg.harm.id.
Extract cut information from a policy tree
Description
Extracts all split points and variables from a policy tree
Usage
extract_tree_cuts(tree)
Arguments
tree |
Policy tree object |
Value
List with cuts (expressions) and names (unique variables)
Generate Figure Note for Quarto/RMarkdown
Description
Formats the figure note from plot_sg_weighted_km() output for use in Quarto or RMarkdown documents.
Usage
figure_note(
x,
prefix = "*Note*: ",
include_definition = TRUE,
include_hr_explanation = TRUE,
custom_text = NULL
)
Arguments
x |
Output from plot_sg_weighted_km() |
prefix |
Character. Prefix for the note. Default uses italic Note. |
include_definition |
Logical. Include subgroup definition. Default: TRUE |
include_hr_explanation |
Logical. Include HR(bc) explanation. Default: TRUE |
custom_text |
Character. Additional custom text to append. Default: NULL |
Value
Character string formatted as a figure note, or NULL if no content
Filter a vector by LASSO-selected variables
Description
Returns elements of x that are in lassokeep.
Usage
filter_by_lassokeep(x, lassokeep)
Arguments
x |
Character vector. |
lassokeep |
Character vector of selected variables. |
Value
Filtered character vector or NULL.
Filter and merge arguments for function calls
Description
Simplifies the common pattern of filtering arguments from a source list to match a target function's formal parameters, then adding/overriding specific arguments.
Usage
filter_call_args(source_args, target_func, override_args = NULL)
Arguments
source_args |
List of all arguments (typically from |
target_func |
Function whose formals define the filter criteria. |
override_args |
List of arguments to add or override (optional). |
Details
This function:
Extracts formal parameter names from
target_funcKeeps only arguments from
source_argsthat match those namesAdds or overrides with any
override_argsprovided
Reduces boilerplate and improves readability across the codebase.
Value
List of filtered arguments ready for do.call().
Find Covariate Any Match
Description
Helper function to determine if any CV fold found a subgroup involving the same covariate (not necessarily same cut).
Usage
find_covariate_any_match(sg_target, sg1, sg2, confs)
Arguments
sg_target |
Character. Target subgroup definition to match. |
sg1 |
Character vector. Subgroup 1 labels for each fold. |
sg2 |
Character vector. Subgroup 2 labels for each fold. |
confs |
Character vector. Confounder names. |
Value
Numeric vector (0/1) indicating match for each fold.
Find k_inter Value to Achieve Target Harm Subgroup Hazard Ratio
Description
Uses numerical root-finding to determine the interaction parameter (k_inter) that achieves a specified target hazard ratio in the harm subgroup. This is the most efficient method for single target calibration.
Usage
find_k_inter_for_target_hr(
target_hr_harm,
data,
continuous_vars,
factor_vars,
outcome_var,
event_var,
treatment_var,
subgroup_vars,
subgroup_cuts,
k_treat = 1,
k_inter_range = c(-10, 10),
tol = 0.001,
n_super = 5000,
verbose = TRUE
)
Arguments
target_hr_harm |
Numeric value specifying the target hazard ratio for the harm subgroup. Must be positive. |
data |
A data.frame containing the dataset to use for model fitting. |
continuous_vars |
Character vector of continuous variable names to be standardized and included as covariates. |
factor_vars |
Character vector of factor/categorical variable names to be converted to dummy variables. |
outcome_var |
Character string specifying the name of the outcome/time variable. |
event_var |
Character string specifying the name of the event/status variable (1 = event, 0 = censored). |
treatment_var |
Character string specifying the name of the treatment variable. |
subgroup_vars |
Character vector of variable names defining the subgroup. |
subgroup_cuts |
Named list of cutpoint specifications for subgroup variables.
See |
k_treat |
Numeric value for treatment effect modifier. Default is 1 (no modification). |
k_inter_range |
Numeric vector of length 2 specifying the search range for k_inter. Default is c(-10, 10). |
tol |
Numeric value specifying tolerance for root finding convergence. Default is 0.001. |
n_super |
Integer specifying size of super population for hazard ratio calculation. Default is 5000. |
verbose |
Logical indicating whether to print progress information. Default is TRUE. |
Details
This function uses the uniroot algorithm to solve the equation:
HR_{harm}(k_{inter}) - HR_{target} = 0
The algorithm typically converges within 5-10 iterations and achieves high precision (within the specified tolerance). If the root-finding fails, the function evaluates the boundaries and provides diagnostic information.
Value
A list of class "k_inter_result" containing:
- k_inter
Numeric value of optimal k_inter parameter
- achieved_hr_harm
Numeric value of achieved hazard ratio in harm subgroup
- target_hr_harm
Numeric value of target hazard ratio (for reference)
- error
Numeric value of absolute error between achieved and target HR
- dgm
Object of class "aft_dgm_flex" containing the final DGM
- convergence
Integer number of iterations to convergence
- method
Character string "root-finding" indicating method used
See Also
sensitivity_analysis_k_inter for sensitivity analysis
generate_aft_dgm_flex for DGM generation
Find the split that leads to a specific leaf node
Description
Identifies the split point that creates a given leaf node
Usage
find_leaf_split(tree, leaf_node)
Arguments
tree |
Policy tree object |
leaf_node |
Integer. Leaf node identifier |
Value
Character string with split expression or NULL
Find Quantile for Target Subgroup Proportion
Description
Determines the quantile cutpoint that achieves a target proportion of observations in a subgroup. Useful for calibrating subgroup sizes.
Usage
find_quantile_for_proportion(
data,
var_name,
target_prop,
direction = "less",
tol = 1e-04
)
Arguments
data |
A data.frame containing the variable of interest |
var_name |
Character string specifying the variable name to analyze |
target_prop |
Numeric value between 0 and 1 specifying the target proportion of observations to be included in the subgroup |
direction |
Character string: "less" for values <= cutpoint (default), "greater" for values > cutpoint |
tol |
Numeric tolerance for root finding algorithm. Default is 0.0001 |
Details
This function uses root finding (uniroot) to determine the quantile
that results in exactly the target proportion of observations being classified
into the subgroup. This is particularly useful when you want to ensure a
specific subgroup size regardless of the data distribution.
Value
A list containing:
- quantile
The quantile value (between 0 and 1) that achieves the target proportion
- cutpoint
The actual data value corresponding to this quantile
- actual_proportion
The achieved proportion (should equal target_prop within tolerance)
See Also
Find Minimum Sample Size for Target Detection Power
Description
Determines the minimum subgroup sample size needed to achieve a target detection probability for a given true hazard ratio.
Usage
find_required_sample_size(
theta,
target_power = 0.8,
prop_cens = 0.3,
hr_threshold = 1.25,
hr_consistency = 1,
n_range = c(20L, 500L),
tol = 1,
verbose = TRUE
)
Arguments
theta |
Numeric. True hazard ratio in subgroup. |
target_power |
Numeric. Target detection probability (0-1). Default: 0.80 |
prop_cens |
Numeric. Proportion censored. Default: 0.3 |
hr_threshold |
Numeric. HR threshold. Default: 1.25 |
hr_consistency |
Numeric. HR consistency threshold. Default: 1.0 |
n_range |
Integer vector of length 2. Range of sample sizes to search. Default: c(20, 500) |
tol |
Numeric. Tolerance for bisection search. Default: 1 |
verbose |
Logical. Print progress. Default: TRUE |
Value
A list with:
n_sg_required |
Minimum sample size (rounded up) |
achieved_power |
Actual detection probability at n_sg_required |
theta |
Input hazard ratio |
target_power |
Input target power |
Fit AFT Model with Optional Spline Treatment Effect
Description
Fit AFT Model with Optional Spline Treatment Effect
Usage
fit_aft_model(
df_work,
interaction_term,
k_treat,
k_inter,
verbose,
spline_spec = NULL,
set_var = NULL,
beta_var = NULL
)
Fit AFT Model with Spline Treatment Effect
Description
Fit AFT Model with Spline Treatment Effect
Usage
fit_aft_model_spline(
df_work,
covariate_cols,
interaction_term,
k_treat,
k_inter,
spline_spec,
verbose,
set_var = NULL,
beta_var = NULL
)
Fit Standard AFT Model (Non-Spline)
Description
Fit Standard AFT Model (Non-Spline)
Usage
fit_aft_model_standard(
df_work,
covariate_cols,
interaction_term,
k_treat,
k_inter,
verbose,
set_var = NULL,
beta_var = NULL
)
Fit causal survival forest
Description
Wrapper function to fit GRF causal survival forest with appropriate settings
Usage
fit_causal_forest(X, Y, W, D, tau.rmst, RCT, seedit)
Arguments
X |
Matrix. Covariate matrix |
Y |
Numeric vector. Outcome variable |
W |
Numeric vector. Treatment indicator |
D |
Numeric vector. Event indicator |
tau.rmst |
Numeric. Time horizon for RMST |
RCT |
Logical. Is this RCT data? |
seedit |
Integer. Random seed |
Value
Causal survival forest object
Fit Cox Model for Subgroup
Description
Fit Cox Model for Subgroup
Usage
fit_cox_for_subgroup(yy, dd, tt, id.x)
Fit Cox Models for Subgroups
Description
Fits Cox models for two subgroups defined by treatment recommendation.
Usage
fit_cox_models(df, formula)
Arguments
df |
Data frame. |
formula |
Cox model formula. |
Value
List with HR and SE for each subgroup.
Fit policy trees up to specified depth
Description
Fits policy trees of depths 1 through maxdepth and computes metrics
Usage
fit_policy_trees(X, data, dr.scores, maxdepth, n.min)
Arguments
X |
Matrix. Covariate matrix |
data |
Data frame. Original data |
dr.scores |
Matrix. Doubly robust scores |
maxdepth |
Integer. Maximum tree depth (1-3) |
n.min |
Integer. Minimum subgroup size |
Value
List with trees and combined values
ForestSearch: Exploratory Subgroup Identification
Description
Identifies subgroups with differential treatment effects in clinical trials using a combination of Generalized Random Forests (GRF), LASSO variable selection, and exhaustive combinatorial search with split-sample validation.
Usage
forestsearch(
df.analysis,
outcome.name = "tte",
event.name = "event",
treat.name = "treat",
id.name = "id",
potentialOutcome.name = NULL,
flag_harm.name = NULL,
confounders.name = NULL,
parallel_args = list(plan = "callr", workers = 6, show_message = TRUE),
df.predict = NULL,
df.test = NULL,
is.RCT = TRUE,
seedit = 8316951,
est.scale = "hr",
use_lasso = TRUE,
use_grf = TRUE,
grf_res = NULL,
grf_cuts = NULL,
max_n_confounders = 1000,
grf_depth = 2,
dmin.grf = 12,
frac.tau = 0.6,
return_selected_cuts_only = TRUE,
conf_force = NULL,
defaultcut_names = NULL,
cut_type = "default",
exclude_cuts = NULL,
replace_med_grf = FALSE,
cont.cutoff = 4,
conf.cont_medians = NULL,
conf.cont_medians_force = NULL,
n.min = 60,
hr.threshold = 1.25,
hr.consistency = 1,
sg_focus = "hr",
fs.splits = 1000,
m1.threshold = Inf,
pconsistency.threshold = 0.9,
stop_threshold = 0.95,
showten_subgroups = FALSE,
d0.min = 12,
d1.min = 12,
max.minutes = 3,
minp = 0.025,
details = FALSE,
maxk = 2,
by.risk = 12,
plot.sg = FALSE,
plot.grf = FALSE,
max_subgroups_search = 10,
vi.grf.min = -0.2,
use_twostage = TRUE,
twostage_args = list()
)
Arguments
df.analysis |
Data frame. Analysis dataset with required columns. |
outcome.name |
Character. Name of time-to-event outcome variable. Default "tte". |
event.name |
Character. Name of event indicator (1=event, 0=censored). Default "event". |
treat.name |
Character. Name of treatment variable (1=treatment, 0=control). Default "treat". |
id.name |
Character. Name of subject ID variable. Default "id". |
potentialOutcome.name |
Character. Name of potential outcome variable (optional). |
flag_harm.name |
Character. Name of true harm flag for simulations (optional). |
confounders.name |
Character vector. Names of candidate subgroup-defining variables. |
parallel_args |
List. Parallel processing configuration:
|
df.predict |
Data frame. Prediction dataset (optional). |
df.test |
Data frame. Test dataset (optional). |
is.RCT |
Logical. Is this a randomized controlled trial? Default TRUE. |
seedit |
Integer. Random seed. Default 8316951. |
est.scale |
Character. Estimation scale ("hr" or "rmst"). Default "hr". |
use_lasso |
Logical. Use LASSO for variable selection. Default TRUE. |
use_grf |
Logical. Use GRF for variable importance. Default TRUE. |
grf_res |
GRF results object (optional, for reuse). |
grf_cuts |
List. Custom GRF cut points (optional). |
max_n_confounders |
Integer. Maximum confounders to consider. Default 1000. |
grf_depth |
Integer. GRF tree depth. Default 2. |
dmin.grf |
Integer. Minimum events for GRF. Default 12. |
frac.tau |
Numeric. Fraction of tau for RMST. Default 0.6. |
return_selected_cuts_only |
Logical. If TRUE (default), GRF returns only cuts from the
tree depth that identified the selected subgroup meeting |
conf_force |
Character vector. Variables to force include (optional). |
defaultcut_names |
Character vector. Default cut variable names (optional). |
cut_type |
Character. Cut type ("default" or "custom"). Default "default". |
exclude_cuts |
Character vector. Variables to exclude from cutting (optional). |
replace_med_grf |
Logical. Replace median with GRF cuts. Default FALSE. |
cont.cutoff |
Integer. Cutoff for continuous vs categorical. Default 4. |
conf.cont_medians |
Named numeric vector. Median values for continuous variables (optional). |
conf.cont_medians_force |
Named numeric vector. Forced median values (optional). |
n.min |
Integer. Minimum subgroup size. Default 60. |
hr.threshold |
Numeric. Minimum HR for candidate subgroups. Default 1.25. |
hr.consistency |
Numeric. Minimum HR for consistency validation. Default 1.0. |
sg_focus |
Character. Subgroup selection focus. One of "hr", "hrMaxSG", "maxSG", "hrMinSG", "minSG". Default "hr". |
fs.splits |
Integer. Number of splits for consistency evaluation (or maximum
splits when |
m1.threshold |
Numeric. Maximum median survival threshold. Default Inf. |
pconsistency.threshold |
Numeric. Minimum consistency proportion. Default 0.90. |
stop_threshold |
Numeric in Note: Values > 1.0 are not permitted. To disable early
stopping, use Automatically reset to |
showten_subgroups |
Logical. Show top 10 subgroups. Default FALSE. |
d0.min |
Integer. Minimum control arm events. Default 12. |
d1.min |
Integer. Minimum treatment arm events. Default 12. |
max.minutes |
Numeric. Maximum search time in minutes. Default 3. |
minp |
Numeric. Minimum prevalence threshold. Default 0.025. |
details |
Logical. Print progress details. Default FALSE. |
maxk |
Integer. Maximum number of factors per subgroup. Default 2. |
by.risk |
Integer. Risk table interval. Default 12. |
plot.sg |
Logical. Plot subgroup survival curves. Default FALSE. |
plot.grf |
Logical. Plot GRF results. Default FALSE. |
max_subgroups_search |
Integer. Maximum subgroups to evaluate. Default 10. |
vi.grf.min |
Numeric. Minimum GRF variable importance. Default -0.2. |
use_twostage |
Logical. Use two-stage sequential consistency algorithm for
improved performance. Default FALSE for backward compatibility. When TRUE,
|
twostage_args |
List. Parameters for two-stage algorithm (only used when
|
Details
Algorithm Overview:
-
Variable Selection: GRF identifies variables with treatment effect heterogeneity; LASSO selects most predictive
-
Subgroup Discovery: Exhaustive search over factor combinations up to
maxk -
Consistency Validation: Split-sample validation ensures reproducibility
-
Selection: Choose subgroup based on
sg_focuscriterion
Two-Stage Consistency Algorithm:
When use_twostage = TRUE, the consistency evaluation uses an optimized
algorithm that can provide 3-10x speedup:
-
Stage 1: Quick screening with
n.splits.screensplits eliminates clearly non-viable candidates -
Stage 2: Sequential batched evaluation with early stopping for candidates passing Stage 1
The two-stage algorithm is recommended for:
Exploratory analyses with many candidate subgroups
Large
fs.splitsvalues (>200)Iterative model development
For final regulatory submissions, use_twostage = FALSE may be preferred
for exact reproducibility.
Value
A list of class "forestsearch" containing:
- grp.consistency
Consistency evaluation results including:
out_sg: Selected subgroup based on sg_focus
sg_focus: Focus criterion used
df_flag: Treatment recommendations
algorithm: "twostage" or "fixed"
n_candidates_evaluated: Number evaluated
n_passed: Number passing threshold
- find.grps
Subgroup search results
- confounders.candidate
Candidate confounders considered
- confounders.evaluated
Confounders after variable selection
- df.est
Analysis data with treatment recommendations
- df.predict
Prediction data with recommendations (if provided)
- df.test
Test data with recommendations (if provided)
- minutes_all
Total computation time
- grf_res
GRF results object
- sg_focus
Subgroup focus criterion used
- sg.harm
Selected subgroup definition
- grf_cuts
GRF cut points used
- prop_maxk
Proportion of max combinations searched
- max_sg_est
Maximum subgroup HR estimate
- grf_plot
GRF plot object (if plot.grf = TRUE)
- args_call_all
All arguments for reproducibility
References
FDA Guidance for Industry: Enrichment Strategies for Clinical Trials
Athey & Imbens (2016). Recursive partitioning for heterogeneous causal effects. PNAS.
Wager & Athey (2018). Estimation and inference of heterogeneous treatment effects using random forests. JASA.
See Also
subgroup.consistency for consistency evaluation details
forestsearch_bootstrap_dofuture for bootstrap inference
forestsearch_Kfold for cross-validation
Package website: https://larry-leon.github.io/forestsearch/
Source code: https://github.com/larry-leon/forestsearch
ForestSearch K-Fold Cross-Validation
Description
This function assesses the stability and reproducibility of ForestSearch subgroup identification through cross-validation. For each fold:
Train ForestSearch on (K-1) folds
Apply the identified subgroup to the held-out fold
Compare predictions to the original full-data analysis
Usage
forestsearch_Kfold(
fs.est,
Kfolds = nrow(fs.est$df.est),
seedit = 8316951L,
parallel_args = list(plan = "multisession", workers = 6, show_message = TRUE),
sg0.name = "Not recommend",
sg1.name = "Recommend",
details = FALSE
)
Arguments
fs.est |
List. ForestSearch results object from |
Kfolds |
Integer. Number of folds (default: |
seedit |
Integer. Random seed for fold assignment (default: 8316951). |
parallel_args |
List. Parallelization configuration with elements:
|
sg0.name |
Character. Label for subgroup 0 (default: "Not recommend"). |
sg1.name |
Character. Label for subgroup 1 (default: "Recommend"). |
details |
Logical. Print progress details (default: FALSE). |
Details
Performs K-fold cross-validation for ForestSearch, evaluating subgroup identification and agreement between training and test sets.
Value
List with components:
- resCV
Data frame with CV predictions for each observation
- cv_args
Arguments used for CV ForestSearch calls
- timing_minutes
Execution time in minutes
- prop_SG_found
Percentage of folds where a subgroup was found
- sg_analysis
Original subgroup definition from full-data analysis
- sg0.name, sg1.name
Subgroup labels
- Kfolds
Number of folds used
- sens_summary
Named vector of sensitivity metrics (sens_H, sens_Hc, ppv_H, ppv_Hc)
- find_summary
Named vector of subgroup-finding metrics (Any, Exact, etc.)
Cross-Validation Types
-
Leave-One-Out (LOO): When
Kfolds = nrow(df), each observation is held out once. Most thorough but computationally intensive. -
K-Fold: When
Kfolds < nrow(df), data is split into K roughly equal folds. Good balance of bias-variance tradeoff.
Output Metrics
The returned resCV data frame contains:
-
treat.recommend: Prediction from CV model -
treat.recommend.original: Prediction from full-data model -
cvindex: Fold assignment -
sg1,sg2: Subgroup definitions found in each fold
See Also
forestsearch for initial subgroup identification
forestsearch_KfoldOut for summarizing CV results
forestsearch_tenfold for repeated K-fold simulations
ForestSearch K-Fold Cross-Validation Output Summary
Description
Summarizes cross-validation results for ForestSearch, including subgroup agreement and performance metrics.
Usage
forestsearch_KfoldOut(res, details = FALSE, outall = FALSE)
Arguments
res |
List. Result object from ForestSearch cross-validation, must contain
elements: |
details |
Logical. Print details during execution (default: FALSE). |
outall |
Logical. If TRUE, returns all summary tables; if FALSE, returns only metrics (default: FALSE). |
Value
If outall=FALSE, a list with sens_metrics_original and
find_metrics. If outall=TRUE, a list with summary tables and metrics.
ForestSearch Bootstrap with doFuture Parallelization
Description
Orchestrates bootstrap analysis for ForestSearch using doFuture parallelization. Implements bias correction methods to adjust for optimism in subgroup selection.
Usage
forestsearch_bootstrap_dofuture(
fs.est,
nb_boots,
seed = 8316951L,
details = FALSE,
show_three = FALSE,
parallel_args = list()
)
Arguments
fs.est |
List. ForestSearch results object from |
nb_boots |
Integer. Number of bootstrap samples (recommend 500-1000). |
seed |
Integer. Random seed for reproducibility of bootstrap sample
generation. Default |
details |
Logical. If |
show_three |
Logical. If |
parallel_args |
List. Parallelization configuration with elements:
If empty list, inherits settings from original forestsearch call. |
Value
List with the following components:
- results
Data.table with bias-corrected estimates for each bootstrap iteration
- SG_CIs
List of confidence intervals for H and Hc (raw and bias-corrected)
- FSsg_tab
Formatted table of subgroup estimates
- Ystar_mat
Matrix (nb_boots x n) of bootstrap sample indicators
- H_estimates
Detailed estimates for subgroup H
- Hc_estimates
Detailed estimates for subgroup Hc
- summary
(If create_summary=TRUE) Enhanced summary with tables and diagnostics
Bias Correction Methods
Two bias correction approaches are implemented:
-
Method 1 (Simple Optimism):
H_{adj1} = H_{obs} - (H^*_{*} - H^*_{obs})where
H^*_{*}is the new subgroup HR on bootstrap data andH^*_{obs}is the new subgroup HR on original data. -
Method 2 (Double Bootstrap):
H_{adj2} = 2 \times H_{obs} - (H_{*} + H^*_{*} - H^*_{obs})where
H_{*}is the original subgroup HR on bootstrap data.
Variable Naming Convention
-
H: Original subgroup (harm/questionable, treat.recommend == 0) -
Hc: Complement subgroup (recommend, treat.recommend == 1) -
_obs: Estimate from original data -
_star: Estimate from bootstrap data -
_biasadj_1: Bias correction method 1 -
_biasadj_2: Bias correction method 2
Performance
Typical runtime: 1-5 seconds per bootstrap iteration. For 1000 bootstraps with 6 workers, expect 3-10 minutes total. Memory usage scales with dataset size and number of workers.
Requirements
Original
fs.estmust have identified a valid subgroupRequires packages:
data.table,foreach,doFuture,survivalFor plots: requires
ggplot2
See Also
forestsearch for initial subgroup identification
bootstrap_results for the core bootstrap worker function
build_cox_formula for Cox formula construction
fit_cox_models for Cox model fitting
ForestSearch Repeated K-Fold Cross-Validation
Description
This function performs multiple independent K-fold cross-validations to assess the variability in subgroup identification. Each simulation:
Randomly shuffles the data
Performs K-fold CV
Records sensitivity and agreement metrics
Results are summarized across all simulations.
Usage
forestsearch_tenfold(
fs.est,
sims,
Kfolds = 10,
details = TRUE,
seed = 8316951L,
parallel_args = list(plan = "multisession", workers = 6, show_message = TRUE)
)
Arguments
fs.est |
List. ForestSearch results object from |
sims |
Integer. Number of simulation repetitions. |
Kfolds |
Integer. Number of folds per simulation (default: 10). |
details |
Logical. Print progress details (default: TRUE). |
seed |
Integer. Base random seed for fold shuffling. Default 8316951L. Each simulation uses seed + 1000 * ksim for reproducibility. |
parallel_args |
List. Parallelization configuration. |
Details
Runs repeated K-fold cross-validation simulations for ForestSearch and summarizes subgroup identification stability across repetitions.
Value
List with components:
- sens_summary
Named vector of median sensitivity metrics across simulations
- find_summary
Named vector of median subgroup-finding metrics
- sens_out
Matrix of sensitivity metrics (sims x metrics)
- find_out
Matrix of finding metrics (sims x metrics)
- timing_minutes
Total execution time
- sims
Number of simulations run
- Kfolds
Number of folds per simulation
Parallelization Strategy
Unlike the single K-fold function which parallelizes across folds, this function parallelizes across simulations for better efficiency when running many repetitions. Each simulation runs its K-fold CV sequentially.
See Also
forestsearch_Kfold for single K-fold CV
forestsearch_KfoldOut for summarizing CV results
Format Confidence Interval for Estimates
Description
Formats confidence interval for estimates.
Usage
format_CI(estimates, col_names)
Arguments
estimates |
Data frame or data.table of estimates. |
col_names |
Character vector of column names for estimate, lower, upper. |
Value
Character string formatted as \"estimate (lower, upper)\".
Format Bootstrap Diagnostics Table with gt
Description
Creates a publication-ready diagnostics table from bootstrap results.
Usage
format_bootstrap_diagnostics_table(
diagnostics,
nb_boots,
results,
H_estimates = NULL,
Hc_estimates = NULL
)
Arguments
diagnostics |
List. Diagnostics information from summarize_bootstrap_results() |
nb_boots |
Integer. Number of bootstrap iterations |
results |
Data.table. Bootstrap results with bias-corrected estimates |
H_estimates |
List. H subgroup estimates |
Hc_estimates |
List. Hc subgroup estimates |
Value
A gt table object
Format Bootstrap Results Table with gt
Description
Creates a publication-ready table from ForestSearch bootstrap results, with bias-corrected confidence intervals, informative formatting, and optional subgroup definition footnote.
Usage
format_bootstrap_table(
FSsg_tab,
nb_boots,
est.scale = "hr",
boot_success_rate = NULL,
sg_definition = NULL,
title = NULL,
subtitle = NULL
)
Arguments
FSsg_tab |
Data frame or matrix from forestsearch_bootstrap_dofuture()$FSsg_tab |
nb_boots |
Integer. Number of bootstrap iterations performed |
est.scale |
Character. "hr" or "1/hr" for effect scale |
boot_success_rate |
Numeric. Proportion of bootstraps that found subgroups |
sg_definition |
Character. Subgroup definition string to display as footnote (e.g., "{age>=50} & {nodes>=3}"). If NULL, no subgroup footnote is added. |
title |
Character. Custom title (optional) |
subtitle |
Character. Custom subtitle (optional) |
Value
A gt table object
Format Bootstrap Timing Table with gt
Description
Creates a publication-ready timing summary table from bootstrap results.
Usage
format_bootstrap_timing_table(timing_list, nb_boots, boot_success_rate)
Arguments
timing_list |
List. Timing information from summarize_bootstrap_results()$timing |
nb_boots |
Integer. Number of bootstrap iterations |
boot_success_rate |
Numeric. Proportion of successful bootstraps |
Value
A gt table object
Format Continuous Variable Definition for Display
Description
Format Continuous Variable Definition for Display
Usage
format_continuous_definition(var_data, cut_spec, var_name)
Format ForestSearch Details Output for Beamer Two-Column Display
Description
Captures forestsearch(details = TRUE) console output and splits it
into two columns for readable beamer slides. Left column shows variable
selection (GRF, LASSO, candidate factors); right column shows subgroup
search, consistency evaluation, and results.
Usage
format_fs_details(
fs_output,
split_after = "Candidate factors",
fontsize = "scriptsize",
col_widths = c(0.48, 0.52),
max_width = 48
)
Arguments
fs_output |
Character vector of captured output lines from
|
split_after |
Character string (regex). The output is split after the
block matching this pattern. Default: |
fontsize |
Character. LaTeX font size for the output text.
One of |
col_widths |
Numeric vector of length 2. Column widths as fractions
of |
max_width |
Integer. Maximum character width per line before wrapping. Long lines are wrapped at comma or space boundaries with a 4-space continuation indent. Default: 48 (suitable for half-slide columns at scriptsize). |
Value
Invisibly returns a list with left and right
character vectors. Side effect: emits LaTeX via cat() for use
in a chunk with results='asis'.
Quarto Setup
No special LaTeX packages required. Works in any beamer frame
without the fragile option.
Usage
In a Quarto beamer chunk with results='asis' and
echo=FALSE, first capture the forestsearch output with
capture.output(), then call format_fs_details(fs_output).
Format Operating Characteristics Results as GT Table
Description
Creates a formatted gt table from simulation operating characteristics results.
Usage
format_oc_results(
results,
analyses = NULL,
metrics = "all",
digits = 3,
digits_hr = 3,
title = "Operating Characteristics Summary",
subtitle = NULL,
use_gt = TRUE
)
Arguments
results |
data.table or data.frame. Simulation results from
|
analyses |
Character vector. Analysis methods to include. Default: NULL (all analyses in results) |
metrics |
Character vector. Metrics to display. Options include: "detection", "classification", "hr_estimates", "ahr_estimates", "cde_estimates", "subgroup_size", "all". Default: "all" |
digits |
Integer. Decimal places for proportions. Default: 3 |
digits_hr |
Integer. Decimal places for hazard ratios. Default: 3 |
title |
Character. Table title. Default: "Operating Characteristics Summary" |
subtitle |
Character. Table subtitle. Default: NULL |
use_gt |
Logical. Return gt table if TRUE, data.frame if FALSE. Default: TRUE |
Details
The function summarizes simulation results across multiple metrics:
-
Found: Proportion of simulations finding a subgroup (any.H)
-
Classification: Sen, spec, PPV, NPV
-
HR Estimates: Mean Cox hazard ratios in true (H) and identified (H-hat) subgroups and their complements
-
AHR Estimates: Mean average hazard ratios (from loghr_po) in true and identified subgroups
-
CDE Estimates: Controlled direct effects (from theta_0/theta_1) in true and identified subgroups
-
Subgroup Size: Average, min, max sizes
Column notation aligns with build_estimation_table and
Leon et al. (2024): H = true (oracle) subgroup, H-hat =
identified subgroup. The asterisk (*) is reserved for bootstrap
bias-corrected estimates and is not used in this summary table.
Value
A gt table object (if use_gt = TRUE and gt package available) or data.frame
Format results for subgroup summary
Description
Formats results for subgroup summary table.
Usage
format_results(
subgroup_name,
n,
n_treat,
d,
m1,
m0,
drmst,
hr,
hr_a = NA,
hr_po = NA,
return_medians = TRUE
)
Arguments
subgroup_name |
Character. Subgroup name. |
n |
Character. Sample size. |
n_treat |
Character. Treated count. |
d |
Character. Event count. |
m1 |
Numeric. Median or RMST for treatment. |
m0 |
Numeric. Median or RMST for control. |
drmst |
Numeric. RMST difference. |
hr |
Character. Hazard ratio (formatted). |
hr_a |
Character. Adjusted hazard ratio (optional). |
hr_po |
Numeric. Potential outcome hazard ratio (optional). |
return_medians |
Logical. Use medians or RMST. |
Value
Character vector of results.
Format Search Results
Description
Format Search Results
Usage
format_search_results(
results_list,
Z,
details,
t.sofar,
L,
max_count,
filter_counts = NULL
)
Arguments
results_list |
List of result rows |
Z |
Matrix of factor indicators |
details |
Logical. Print details |
t.sofar |
Numeric. Time elapsed |
L |
Integer. Number of factors |
max_count |
Integer. Maximum combinations |
filter_counts |
List. Counts at each filtering stage (optional) |
Format Subgroup Summary Tables with gt
Description
Creates publication-ready gt tables for bootstrap subgroup analysis
Usage
format_subgroup_summary_tables(subgroup_summary, nb_boots)
Arguments
subgroup_summary |
List from summarize_bootstrap_subgroups() |
nb_boots |
Integer. Number of bootstrap iterations |
Value
List of gt table objects
Generate Synthetic Survival Data using AFT Model with Flexible Subgroups
Description
Creates a data generating mechanism (DGM) for survival data using an Accelerated Failure Time (AFT) model with Weibull distribution. Supports flexible subgroup definitions and treatment-subgroup interactions.
Usage
generate_aft_dgm_flex(
data,
continuous_vars,
factor_vars,
continuous_vars_cens = NULL,
factor_vars_cens = NULL,
set_beta_spec = list(set_var = NULL, beta_var = NULL),
outcome_var,
event_var,
treatment_var = NULL,
subgroup_vars = NULL,
subgroup_cuts = NULL,
draw_treatment = FALSE,
model = "alt",
k_treat = 1,
k_inter = 1,
n_super = 5000,
select_censoring = TRUE,
cens_type = "weibull",
cens_params = list(),
seed = 8316951,
verbose = TRUE,
standardize = FALSE,
spline_spec = NULL
)
Arguments
data |
A data.frame containing the input dataset to base the simulation on |
continuous_vars |
Character vector of continuous variable names to be standardized and included as covariates |
factor_vars |
Character vector of factor/categorical variable names to be converted to dummy variables (largest value as reference) |
continuous_vars_cens |
Character vector of continuous variable names to be used for censoring model. If NULL, uses same as continuous_vars. Default NULL |
factor_vars_cens |
Character vector of factor variable names to be used for censoring model. If NULL, uses same as factor_vars. Default NULL |
set_beta_spec |
List with elements 'set_var' and 'beta_var' for manually setting specific beta coefficients. Default list(set_var = NULL, beta_var = NULL) |
outcome_var |
Character string specifying the name of the outcome/time variable |
event_var |
Character string specifying the name of the event/status variable (1 = event, 0 = censored) |
treatment_var |
Character string specifying the name of the treatment variable. If NULL, treatment will be randomly simulated with 50/50 allocation |
subgroup_vars |
Character vector of variable names defining the subgroup. Default is NULL (no subgroups) |
subgroup_cuts |
Named list of cutpoint specifications for subgroup variables. See Details section for flexible specification options |
draw_treatment |
Logical indicating whether to redraw treatment assignment in simulation. Default is FALSE (use original assignments) |
model |
Character string: "alt" for alternative model with subgroup effects, "null" for null model without subgroup effects. Default is "alt" |
k_treat |
Numeric treatment effect modifier. Values >1 increase treatment effect, <1 decrease it. Default is 1 (no modification) |
k_inter |
Numeric interaction effect modifier for treatment-subgroup interaction. Default is 1 (no modification) |
n_super |
Integer specifying size of super population to generate. Default is 5000 |
select_censoring |
Logical. If |
cens_type |
Character string specifying censoring distribution type:
|
cens_params |
Named list of censoring distribution parameters.
Interpretation depends on
Default |
seed |
Integer random seed for reproducibility. Default is 8316951 |
verbose |
Logical indicating whether to print diagnostic information during execution. Default is TRUE |
standardize |
Logical indicating whether to standardize continuous variables. Default is FALSE |
spline_spec |
List specifying spline configuration for treatment effect. Must include 'var' (variable name), 'knot', 'zeta', and 'log_hrs' (vector of length 3). Default NULL (no spline) |
Details
Subgroup Cutpoint Specifications
The subgroup_cuts parameter accepts multiple flexible specifications:
Fixed Value
subgroup_cuts = list(er = 20) # er <= 20
Quantile-based
subgroup_cuts = list( er = list(type = "quantile", value = 0.25) # er <= 25th percentile )
Function-based
subgroup_cuts = list( er = list(type = "function", fun = median) # er <= median )
Range
subgroup_cuts = list( age = list(type = "range", min = 40, max = 60) # 40 <= age <= 60 )
Greater than
subgroup_cuts = list( nodes = list(type = "greater", quantile = 0.75) # nodes > 75th percentile )
Multiple values (for categorical)
subgroup_cuts = list( grade = list(type = "multiple", values = c(2, 3)) # grade in (2, 3) )
Custom function
subgroup_cuts = list(
er = list(
type = "custom",
fun = function(x) x <= quantile(x, 0.3) | x >= quantile(x, 0.9)
)
)
Model Structure
The AFT model with Weibull distribution is specified as:
\log(T) = \mu + \gamma' X + \sigma \epsilon
Where:
-
Tis the survival time -
\muis the intercept -
\gammacontains the covariate effects -
Xincludes treatment, covariates, and treatment x subgroup interaction -
\sigmais the scale parameter -
\epsilonfollows an extreme value distribution
Interaction Term
The model creates a SINGLE interaction term representing the treatment effect modification when ALL subgroup conditions are simultaneously satisfied. This is not multiple separate interactions but one combined indicator.
Value
A named list of class aft_dgm containing:
data |
Simulated trial data frame with outcome, event, and treatment columns. |
model_params |
Model parameters used for data generation (coefficients, dispersion, spline info if applicable). |
subgroup_info |
Subgroup definition and membership indicators, if a heterogeneous treatment effect was specified. |
censoring_info |
Censoring model parameters and observed censoring rate. |
call_args |
Arguments used in the call, for reproducibility. |
Author(s)
Your Name
References
Leon, L.F., et al. (2024). Statistics in Medicine.
Kalbfleisch, J.D. and Prentice, R.L. (2002). The Statistical Analysis of Failure Time Data (2nd ed.). Wiley.
Examples
df <- survival::gbsg
dgm <- generate_aft_dgm_flex(
data = df,
outcome_var = "rfstime",
event_var = "status",
treatment_var = "hormon",
continuous_vars = c("age", "size", "nodes", "pgr", "er"),
factor_vars = "meno",
model = "null",
verbose = FALSE
)
str(dgm)
Generate Synthetic Data using Bootstrap with Perturbation
Description
Generate Synthetic Data using Bootstrap with Perturbation
Usage
generate_bootstrap_synthetic(
data,
continuous_vars,
cat_vars,
n = NULL,
seed = 123,
noise_level = 0.1,
id_var = NULL,
cat_flip_prob = NULL,
preserve_bounds = TRUE,
ordinal_vars = NULL
)
Arguments
data |
Original dataset to bootstrap from |
continuous_vars |
Character vector of continuous variable names |
cat_vars |
Character vector of categorical variable names |
n |
Number of synthetic observations to generate (default: same as original) |
seed |
Random seed for reproducibility |
noise_level |
Noise level for perturbation (0 to 1, default 0.1) |
id_var |
Optional name of ID variable to regenerate (will be numbered 1:n) |
cat_flip_prob |
Probability of flipping categorical values (default: noise_level/2) |
preserve_bounds |
Logical: should continuous variables stay within original bounds? (default: TRUE) |
ordinal_vars |
Optional character vector of ordinal categorical variables (these will be perturbed to adjacent values rather than randomly flipped) |
Value
A data frame with synthetic data
Generate Bootstrap Sample with Added Noise
Description
Creates a bootstrap sample from a dataset with controlled noise added to both continuous and categorical variables. This function is useful for generating synthetic datasets that maintain the general structure of the original data while introducing controlled variation.
Usage
generate_bootstrap_with_noise(
data,
n = NULL,
continuous_vars = NULL,
cat_vars = NULL,
id_var = "pid",
seed = 123,
noise_level = 0.1
)
Arguments
data |
A data frame containing the original dataset to bootstrap from. |
n |
Integer. Number of observations in the output dataset. If NULL (default), uses the same number of rows as the input data. |
continuous_vars |
Character vector of column names to treat as continuous variables. If NULL (default), automatically detects numeric columns. |
cat_vars |
Character vector of column names to treat as categorical variables. If NULL (default), automatically detects factors, logical columns, and numeric columns with 10 or fewer unique values. |
id_var |
Character string specifying the name of the ID variable column. This column will be reset to sequential values (1:n) in the output. Default is "pid". |
seed |
Integer. Random seed for reproducibility. Default is 123. |
noise_level |
Numeric between 0 and 1. Controls the amount of noise added. For continuous variables, this is multiplied by the standard deviation to determine noise magnitude. For categorical variables, this is divided by 2 to determine the probability of value changes. Default is 0.1. |
Details
The function performs the following operations:
Bootstrap Sampling
Samples n observations with replacement from the original dataset.
Continuous Variable Noise
Adds Gaussian noise with standard deviation = original SD × noise_level
Constrains values to remain within original variable bounds
Preserves integer type for variables that appear to be integers
Categorical Variable Perturbation
Changes values with probability = noise_level / 2
Binary variables: flips to opposite value
Multi-level unordered: randomly selects from other levels
Ordered factors: weights selection toward adjacent levels
Preserves factor levels and ordering from original data
Value
A data frame with the same structure as the input data, containing bootstrap sampled observations with added noise.
Note
The function assumes that categorical variables with numeric encoding should maintain their numeric type unless they are factors in the input
Missing values (NA) are handled appropriately in calculations but are not imputed
For ordered factors or variables named "grade", the perturbation favors transitions to adjacent levels over distant levels
See Also
sample for bootstrap sampling,
rnorm for noise generation
Generate Combination Indices
Description
Creates indices for all factor combinations up to maxk
Usage
generate_combination_indices(L, maxk)
Generate Complement Expression
Description
Creates the logical complement of a subgroup expression. Handles common patterns like "var <= x" -> "var > x".
Usage
generate_complement_expression(expr)
Arguments
expr |
Character vector of expressions to negate. |
Value
Character string with negated expression.
Generate Detection Probability Curve
Description
Computes detection probability across a range of hazard ratios to create a power-like curve for subgroup detection.
Usage
generate_detection_curve(
theta_range = c(0.5, 3),
n_points = 50L,
n_sg,
prop_cens = 0.3,
hr_threshold = 1.25,
hr_consistency = 1,
include_reference = TRUE,
method = "cubature",
verbose = TRUE
)
Arguments
theta_range |
Numeric vector of length 2. Range of HR values to evaluate. Default: c(0.5, 3.0) |
n_points |
Integer. Number of points to evaluate. Default: 50 |
n_sg |
Integer. Subgroup sample size. |
prop_cens |
Numeric. Proportion censored (0-1). Default: 0.3 |
hr_threshold |
Numeric. HR threshold for detection. Default: 1.25 |
hr_consistency |
Numeric. HR consistency threshold. Default: 1.0 |
include_reference |
Logical. Include reference HR values (0.5, 0.75, 1.0). Default: TRUE |
method |
Character. Integration method. Default: "cubature" |
verbose |
Logical. Print progress. Default: TRUE |
Value
A data.frame with columns:
theta |
Hazard ratio values |
probability |
Detection probability |
n_sg |
Subgroup size (repeated) |
prop_cens |
Censoring proportion (repeated) |
hr_threshold |
Detection threshold (repeated) |
Generate Synthetic GBSG Data using Generalized Bootstrap
Description
Generate Synthetic GBSG Data using Generalized Bootstrap
Usage
generate_gbsg_bootstrap_general(n = 686, seed = 123, noise_level = 0.1)
Arguments
n |
Number of observations |
seed |
Random seed |
noise_level |
Noise level for perturbation |
Value
Synthetic GBSG dataset
Generate Readable Subgroup Labels from ForestSearch Object
Description
Extracts human-readable subgroup labels that are also valid R expressions
for use with plotKM.band_subgroups(). Attempts to extract the actual
subgroup definition (e.g., "er <= 0") rather than column references.
Usage
generate_readable_sg_labels(fs.est, verbose = FALSE)
Arguments
fs.est |
A forestsearch object. |
verbose |
Logical. Print diagnostic messages. |
Value
Character vector of length 2: c(harm_label, benefit_label)
Generate Super Population and Calculate Linear Predictors
Description
Generate Super Population and Calculate Linear Predictors
Usage
generate_super_population(
df_work,
n_super,
draw_treatment,
gamma,
b0,
mu,
tau,
verbose,
spline_info = NULL
)
Fit Cox Model for Subgroup
Description
Fits a Cox model for a subgroup and returns estimate and standard error.
Usage
get_Cox_sg(df_sg, cox.formula, est.loghr = TRUE, cox_initial = log(1))
Arguments
df_sg |
Data frame for subgroup. |
cox.formula |
Cox model formula. |
est.loghr |
Logical. Is estimate on log(HR) scale? |
cox_initial |
Optional pre-fitted Cox model object to use instead of fitting a new model. Default NULL |
Details
Function is utilized throughout codebase
Value
List with estimate and standard error.
ForestSearch Data Preparation and Feature Selection
Description
Prepares a dataset for ForestSearch, including options for LASSO-based dimension reduction, GRF cuts, forced cuts, and flexible cut strategies. Returns a list with the processed data, subgroup factor names, cut expressions, and LASSO selection results.
Usage
get_FSdata(
df.analysis,
use_lasso = FALSE,
use_grf = FALSE,
grf_cuts = NULL,
confounders.name,
cont.cutoff = 4,
conf_force = NULL,
conf.cont_medians = NULL,
conf.cont_medians_force = NULL,
replace_med_grf = TRUE,
defaultcut_names = NULL,
cut_type = "default",
exclude_cuts = NULL,
outcome.name = "tte",
event.name = "event",
details = TRUE
)
Arguments
df.analysis |
Data frame containing the data. |
use_lasso |
Logical. Whether to use LASSO for dimension reduction. |
use_grf |
Logical. Whether to use GRF cuts. |
grf_cuts |
Character vector of GRF cut expressions. |
confounders.name |
Character vector of confounder variable names. |
cont.cutoff |
Integer. Cutoff for continuous variable determination. |
conf_force |
Character vector of forced cut expressions. |
conf.cont_medians |
Character vector of continuous confounders to cut at median. |
conf.cont_medians_force |
Character vector of additional continuous confounders to force median cut. |
replace_med_grf |
Logical. If TRUE, removes median cuts that overlap with GRF cuts. |
defaultcut_names |
Character vector of confounders to force default cuts. |
cut_type |
Character. "default" or "median" for cut strategy. |
exclude_cuts |
Character vector of cut expressions to exclude. |
outcome.name |
Character. Name of outcome variable. |
event.name |
Character. Name of event indicator variable. |
details |
Logical. If TRUE, prints details during execution. |
Value
A named list containing:
df |
Data frame with binary cut-point indicator columns (named
|
confs_names |
Character vector of the internal column names
( |
confs |
Character vector of candidate factor specifications (continuous cut expressions and categorical variable names). |
lassokeep |
Character vector of factors retained by LASSO (if
|
lassoomit |
Character vector of factors omitted by LASSO (if
|
Get Best Model from Comparison
Description
Extracts the best fitting model object from a comparison result. If no single best model can be determined, returns the Weibull model if selected by either AIC or BIC. Defaults to Weibull0 model if no model can be determined.
Usage
get_best_survreg(comparison_result)
Arguments
comparison_result |
Output from compare_survreg_models or compare_multiple_survreg |
Value
A survreg model object (defaults to Weibull0 model)
Get all exported functions from ForestSearch namespace
Description
Get all exported functions from ForestSearch namespace
Usage
get_bootstrap_exports()
Get all combinations of subgroup factors up to maxk
Description
Generates all possible combinations of subgroup factors up to a specified maximum size.
Usage
get_combinations_info(L, maxk)
Arguments
L |
Integer. Number of subgroup factors. |
maxk |
Integer. Maximum number of factors in a combination. |
Value
List with max_count (total combinations) and indices_list (indices for each k).
Get forced cut expressions for variables
Description
For each variable in conf.force.names, returns cut expressions if continuous.
Usage
get_conf_force(df, conf.force.names, cont.cutoff = 4)
Arguments
df |
Data frame. |
conf.force.names |
Character vector of variable names. |
cont.cutoff |
Integer. Cutoff for continuous. |
Value
Character vector of cut expressions.
Get indicator vector for selected subgroup factors
Description
Returns a vector indicating which factors are included in a subgroup combination.
Usage
get_covs_in(
kk,
maxk,
L,
counts_1factor,
index_1factor,
counts_2factor = NULL,
index_2factor = NULL,
counts_3factor = NULL,
index_3factor = NULL
)
Arguments
kk |
Integer. Index of the combination. |
maxk |
Integer. Maximum number of factors in a combination. |
L |
Integer. Number of subgroup factors. |
counts_1factor |
Integer. Number of single-factor combinations. |
index_1factor |
Matrix of indices for single-factor combinations. |
counts_2factor |
Integer. Number of two-factor combinations. |
index_2factor |
Matrix of indices for two-factor combinations. |
counts_3factor |
Integer. Number of three-factor combinations. |
index_3factor |
Matrix of indices for three-factor combinations. |
Value
Numeric vector indicating selected factors (1 = included, 0 = not included).
Get variable name from cut expression
Description
Extracts the variable name from a cut expression.
Usage
get_cut_name(thiscut, confounders.name)
Arguments
thiscut |
Character string of the cut expression. |
confounders.name |
Character vector of confounder names. |
Value
Character vector of variable names.
Bootstrap Confidence Interval and Bias Correction Results
Description
Calculates confidence intervals and bias-corrected estimates for bootstrap results.
Usage
get_dfRes(
Hobs,
seHobs,
H1_adj,
H2_adj = NULL,
ystar,
cov_method = "standard",
cov_trim = 0,
est.scale = "hr",
est.loghr = TRUE
)
Arguments
Hobs |
Numeric. Observed estimate. |
seHobs |
Numeric. Standard error of observed estimate. |
H1_adj |
Numeric. Bias-corrected estimate 1. |
H2_adj |
Numeric. Bias-corrected estimate 2 (optional). |
ystar |
Matrix of bootstrap samples. |
cov_method |
Character. Covariance method ("standard" or "nocorrect"). |
cov_trim |
Numeric. Trimming proportion for covariance (default: 0.0). |
est.scale |
Character. "hr" or "1/hr". |
est.loghr |
Logical. Is estimate on log(HR) scale? |
Value
Data.table with confidence intervals and estimates.
Generate Prediction Dataset with Subgroup Treatment Recommendation
Description
Creates a prediction dataset with a treatment recommendation flag based
on the subgroup definition. Supports both label expressions
(e.g., "\{er <= 0\}") and bare column names (e.g., "q3.1").
Usage
get_dfpred(df.predict, sg.harm, version = 1)
Arguments
df.predict |
Data frame for prediction (test or validation set). |
sg.harm |
Character vector of subgroup-defining labels. Values may
be wrapped in braces and optionally negated, e.g. |
version |
Integer; encoding version (maintained for backward compatibility). Default: 1. |
Details
Each element of sg.harm is processed as follows:
Outer braces and leading
!are stripped.If the result matches
"var op value"(whereopis one of<=,<,>=,>,==,!=), the comparison is executed directly ondf.predict[[var]].Otherwise the expression is treated as a column name and membership is
df.predict[[name]] == 1.
Value
Data frame with treatment recommendation flag
(treat.recommend): 0 for harm subgroup, 1 for complement.
See Also
evaluate_comparison for the operator-dispatch
logic, forestsearch for the main analysis function.
Extract HR from DGM (Backward Compatible)
Description
Extracts hazard ratios from DGM object, supporting both old and new formats. Also supports CDE (controlled direct effect) extraction for Table 5 of Leon et al. (2024) alignment (theta-ddagger).
Usage
get_dgm_hr(dgm, which = "hr_H")
Arguments
dgm |
DGM object (gbsg_dgm or aft_dgm_flex) |
which |
Character. Which HR to extract: |
Value
Numeric hazard ratio value
Create DGM with Output File Path
Description
Wrapper function that creates a GBSG DGM and generates a standardized output file path for saving results.
Usage
get_dgm_with_output(
model_harm,
n,
k_treat = 1,
target_hr_harm = NULL,
cens_type = "weibull",
out_dir = NULL,
file_prefix = "sim",
file_suffix = "",
include_hr_in_name = FALSE,
verbose = FALSE,
...
)
Arguments
model_harm |
Character. Model type ("alt" or "null") |
n |
Integer. Planned sample size (for filename) |
k_treat |
Numeric. Treatment effect multiplier |
target_hr_harm |
Numeric. Target HR for harm subgroup (used for calibration when model = "alt") |
cens_type |
Character. Censoring type |
out_dir |
Character. Output directory path. If NULL, no file path is generated |
file_prefix |
Character. Prefix for output filename |
file_suffix |
Character. Suffix for output filename |
include_hr_in_name |
Logical. Include achieved HR in filename. Default: FALSE |
verbose |
Logical. Print diagnostic information. Default: FALSE |
... |
Additional arguments passed to |
Value
List with components:
- dgm
The gbsg_dgm object
- out_file
Character path to output file (NULL if out_dir is NULL)
- k_inter
The k_inter value used (either calibrated or default)
Get Parameter with Default Fallback
Description
Safely retrieves a named element from a list, returning a default value
if the element is missing or NULL.
Usage
get_param(args_list, param_name, default_value)
Arguments
args_list |
List to extract from. |
param_name |
Character. Name of the element to retrieve. |
default_value |
Default value to return if element is missing or
|
Value
The value of args_list[[param_name]] if present and
non-NULL, otherwise default_value.
Fast Cox Model HR Estimation
Description
Fits a minimal Cox model to estimate hazard ratio with reduced overhead. Disables robust variance, model matrix storage, and other extras for speed.
Usage
get_split_hr_fast(df, cox_init = 0)
Arguments
df |
data.frame or data.table with Y, Event, Treat columns. |
cox_init |
Numeric. Initial value for coefficient (default 0). |
Value
Numeric. Estimated hazard ratio, or NA if model fails.
Get subgroup membership vector
Description
Returns a vector indicating subgroup membership (1 if all selected factors are present, 0 otherwise).
Usage
get_subgroup_membership(zz, covs.in)
Arguments
zz |
Matrix or data frame of subgroup factor indicators. |
covs.in |
Numeric vector indicating which factors are selected (1 = included). |
Value
Numeric vector of subgroup membership (1/0).
Target Estimate and Standard Error for Bootstrap
Description
Calculates target estimate and standard error for bootstrap samples.
Usage
get_targetEst(x, ystar, cov_method = "standard", cov_trim = 0)
Arguments
x |
Numeric vector of estimates. |
ystar |
Matrix of bootstrap samples. |
cov_method |
Character. Covariance method ("standard" or "nocorrect"). |
cov_trim |
Numeric. Trimming proportion for covariance (default: 0.0). |
Value
List with target estimate, standard errors, and correction term.
ggplot2 / patchwork forest plot
Description
Creates a publication-quality forest plot using ggplot2 for the CI panel
and patchwork to assemble label and annotation columns alongside it.
Unlike forestploter, fig.height maps directly to row density —
row_height = fig.height / n_rows with no hidden scaling.
Usage
gg_forest(
subgroups,
est,
lo,
hi,
cat_vec = NULL,
cat_colours = NULL,
annot = NULL,
ref_line = 1,
vert_lines = NULL,
ref_col = "firebrick",
ref_lty = "dashed",
vert_col = "grey50",
vert_lty = "dotted",
xlim = NULL,
ticks_at = NULL,
tick_labels = NULL,
xlog = TRUE,
xlab = "Hazard Ratio",
title = NULL,
subtitle = NULL,
footnote = NULL,
point_size = 2.5,
line_size = 0.8,
point_shape = 21,
base_size = 11,
widths = NULL,
row_expand = 0.6
)
Arguments
subgroups |
Character vector of subgroup names (displayed top to bottom). |
est |
Numeric vector of point estimates (median HR or similar). |
lo |
Numeric vector of lower bounds (e.g. 1st percentile ECI). |
hi |
Numeric vector of upper bounds (e.g. 99th percentile ECI). |
cat_vec |
Optional character vector of category labels (one per row). Used to colour CI lines and label text. |
cat_colours |
Optional named character vector mapping category labels to colours. Defaults to grey for all rows. |
annot |
Optional named list of character vectors, one per annotation
column. Names become column headers. Each vector must match |
ref_line |
Numeric. X position of the primary reference line (default 1). Drawn as a dashed red line. |
vert_lines |
Numeric vector. X positions of secondary vertical lines (default NULL). Drawn as dotted grey lines. |
ref_col |
Colour of the primary reference line (default "firebrick"). |
ref_lty |
Line type of the primary reference line (default "dashed"). |
vert_col |
Colour of secondary vertical lines (default "grey50"). |
vert_lty |
Line type of secondary vertical lines (default "dotted"). |
xlim |
Numeric vector length 2. X-axis limits for the CI panel. |
ticks_at |
Numeric vector. X-axis tick positions. |
tick_labels |
Character vector. Custom tick labels (default: as.character(ticks_at)). |
xlog |
Logical. If TRUE (default), x-axis on log scale. |
xlab |
Character. X-axis label (default "Hazard Ratio"). |
title |
Character. Overall plot title (default NULL). |
subtitle |
Character. Plot subtitle (default NULL). |
footnote |
Character. Footnote appended below the CI panel (default NULL). |
point_size |
Numeric. Size of point estimate symbol (default 2.5). |
line_size |
Numeric. Line width of CI segments (default 0.8). |
point_shape |
Integer. pch for point estimates (default 21, filled circle). |
base_size |
Numeric. ggplot2 base font size in pt (default 11). Controls all text — increase to make the plot larger; no other knob needed. |
widths |
Numeric vector. Relative patchwork column widths: c(label, ci, annot_1, annot_2, …). Default: c(3.5, 5, rep(1, n_annot)). |
row_expand |
Numeric. Extra space above and below row range on y-axis, in row units (default 0.6). |
Value
A patchwork object. Render with print() or plot().
Control dimensions entirely via knitr chunk options fig.width /
fig.height: row height = fig.height / n_rows.
GRF Subgroup Evaluation and Performance Metrics
Description
Evaluates the performance of GRF-identified subgroups, including hazard ratios, bias, and predictive values. This function is typically used in simulation studies to assess the performance of the GRF subgroup identification method.
Usage
grf.subg.eval(
df,
grf.est,
dgm,
cox.formula.sim,
cox.formula.adj.sim,
analysis = "GRF",
frac.tau = 1
)
Arguments
df |
Data frame containing the analysis data. |
grf.est |
List. Output from |
dgm |
List. Data-generating mechanism (truth) for simulation. |
cox.formula.sim |
Formula for unadjusted Cox model. |
cox.formula.adj.sim |
Formula for adjusted Cox model. |
analysis |
Character. Analysis label (default: "GRF"). |
frac.tau |
Numeric. Fraction of tau for GRF horizon (default: 1.0). |
Value
A data frame with evaluation metrics.
GRF Subgroup Identification for Survival Data
Description
Identifies subgroups with differential treatment effect using generalized random forests (GRF) and policy trees. This function uses causal survival forests to identify heterogeneous treatment effects and policy trees to create interpretable subgroup definitions.
Usage
grf.subg.harm.survival(
data,
confounders.name,
outcome.name,
event.name,
id.name,
treat.name,
frac.tau = 1,
n.min = 60,
dmin.grf = 0,
RCT = TRUE,
details = FALSE,
sg.criterion = "mDiff",
maxdepth = 2,
seedit = 8316951,
return_selected_cuts_only = FALSE
)
Arguments
data |
Data frame containing the analysis data. |
confounders.name |
Character vector of confounder variable names. |
outcome.name |
Character. Name of outcome variable (e.g., time-to-event). |
event.name |
Character. Name of event indicator variable (0/1). |
id.name |
Character. Name of ID variable. |
treat.name |
Character. Name of treatment group variable (0/1). |
frac.tau |
Numeric. Fraction of tau for GRF horizon (default: 1.0). |
n.min |
Integer. Minimum subgroup size (default: 60). |
dmin.grf |
Numeric. Minimum difference in subgroup mean (default: 0.0). |
RCT |
Logical. Is the data from a randomized controlled trial? (default: TRUE) |
details |
Logical. Print details during execution (default: FALSE). |
sg.criterion |
Character. Subgroup selection criterion ("mDiff" or "Nsg"). |
maxdepth |
Integer. Maximum tree depth (1, 2, or 3; default: 2). |
seedit |
Integer. Random seed (default: 8316951). |
return_selected_cuts_only |
Logical. If TRUE, returns only cuts from the tree
depth that identified the selected subgroup meeting |
Details
The return_selected_cuts_only parameter controls which cuts are returned:
- FALSE (default)
Returns all cuts from all fitted trees (depths 1 to
maxdepth). This provides the full set of candidate splits for downstream exploration and is the original behavior for backward compatibility.- TRUE
Returns only cuts from the tree at the depth that identified the "winning" subgroup meeting the
dmin.grfcriterion. This is useful when you want a focused set of cuts associated with the selected subgroup, reducing noise from non-selected trees.
When return_selected_cuts_only = TRUE and no subgroup meets the criteria,
tree.cuts will be empty (character(0)).
Value
A list with GRF results, including:
data |
Original data with added treatment recommendation flags |
grf.gsub |
Selected subgroup information |
sg.harm.id |
Expression defining the identified subgroup |
tree.cuts |
Cut expressions - either all cuts from all trees (if
|
tree.names |
Unique variable names used in cuts |
tree |
Selected policy tree object |
tau.rmst |
Time horizon used for RMST |
harm.any |
All subgroups with positive treatment effect difference |
selected_depth |
Depth of the tree that identified the subgroup (when found) |
return_selected_cuts_only |
Logical indicating which cut extraction mode was used |
Additional tree-specific cuts and objects (tree1, tree2, tree3) based on maxdepth
Check if Matrix Has Positive Variance
Description
Check if Matrix Has Positive Variance
Usage
has_positive_variance(x)
Format Hazard Ratio and Confidence Interval
Description
Formats a hazard ratio and confidence interval for display.
Usage
hrCI_format(hrest)
Arguments
hrest |
Numeric vector with HR, lower, and upper confidence limits. |
Value
Character string formatted as \"HR (lower, upper)\".
Generate Narrative Interpretation of Estimation Properties
Description
Produces a templated text summary of the estimation properties table, automatically populating numerical results from the simulation output. Useful for reproducible vignettes where interpretation paragraphs should update when simulations are re-run.
Usage
interpret_estimation_table(
results,
dgm,
analysis_method = "FSlg",
n_sims = NULL,
n_boots = 300,
digits = 2,
scenario = NULL,
cat = TRUE
)
Arguments
results |
Data frame of simulation results (same as for
|
dgm |
DGM object with true parameter values. |
analysis_method |
Character. Which analysis method to summarise.
Default: |
n_sims |
Integer. Total number of simulations (for detection rate).
If |
n_boots |
Integer. Number of bootstraps (for narrative). Default: 300. |
digits |
Integer. Decimal places for reported values. Default: 2. |
scenario |
Character. One of
If |
cat |
Logical. If |
Value
Invisibly returns the interpretation as a character string.
See Also
build_estimation_table,
format_oc_results, get_dgm_hr
Check if a variable is continuous
Description
Determines if a variable is continuous based on the number of unique values.
Usage
is.continuous(x, cutoff = 4)
Arguments
x |
A vector. |
cutoff |
Integer. Minimum number of unique values to be considered continuous. |
Value
1 if continuous, 2 if not.
Check if cut expression is for a continuous variable (OPTIMIZED)
Description
Determines if a cut expression refers to a continuous variable. This optimized version avoids redundant lookups by using word boundary matching instead of partial string matching.
Usage
is_flag_continuous(thiscut, confounders.name, df, cont.cutoff)
Arguments
thiscut |
Character string of the cut expression. |
confounders.name |
Character vector of confounder names. |
df |
Data frame. |
cont.cutoff |
Integer. Cutoff for continuous. |
Value
Logical; TRUE if continuous, FALSE otherwise.
Check if cut expression should be dropped
Description
Determines if a cut expression should be dropped (e.g., variable has <=1 unique value).
Usage
is_flag_drop(thiscut, confounders.name, df)
Arguments
thiscut |
Character string of the cut expression. |
confounders.name |
Character vector of confounder names. |
df |
Data frame. |
Value
Logical; TRUE if should be dropped, FALSE otherwise.
KM median summary for subgroup
Description
Calculates median survival for each treatment group using Kaplan-Meier.
Usage
km_summary(Y, E, Treat)
Arguments
Y |
Numeric vector of outcome. |
E |
Numeric vector of event indicators. |
Treat |
Numeric vector of treatment indicators. |
Value
Numeric vector of medians.
LASSO selection for Cox model
Description
Performs LASSO variable selection using Cox regression.
Usage
lasso_selection(
df,
confounders.name,
outcome.name,
event.name,
seedit = 8316951
)
Arguments
df |
Data frame. |
confounders.name |
Character vector of confounder names. |
outcome.name |
Character. Name of outcome variable. |
event.name |
Character. Name of event indicator variable. |
seedit |
Integer. Random seed. |
Value
List with selected, omitted variables, coefficients, lambda, and fits.
Check Event Count Criteria
Description
Check Event Count Criteria
Usage
meets_event_criteria(event_counts, d0.min, d1.min)
Check Prevalence Threshold
Description
Check Prevalence Threshold
Usage
meets_prevalence_threshold(x, minp)
MRCT Regional Subgroup Simulation
Description
Simulates multi-regional clinical trials and evaluates ForestSearch subgroup identification. Splits data by region into training and testing populations, identifies subgroups using ForestSearch on training data, and evaluates performance on the testing region.
Usage
mrct_region_sims(
dgm,
n_sims,
n_sample = NULL,
region_var = "z_regA",
sg_focus = "minSG",
maxk = 1,
hr.threshold = 0.9,
hr.consistency = 0.8,
pconsistency.threshold = 0.9,
confounders.name = NULL,
conf_force = NULL,
fs_args = list(),
sim_args = list(rand_ratio = 1, draw_treatment = TRUE),
analysis_time = 60,
cens_adjust = 0,
parallel_args = list(plan = "multisession", workers = NULL, show_message = TRUE),
details = FALSE,
verbose_n_sims = 2L,
seed = NULL
)
Arguments
dgm |
Data generating mechanism object from |
n_sims |
Integer. Number of simulations to run |
n_sample |
Integer. Sample size per simulation. If NULL (default), uses the entire super-population from dgm |
region_var |
Character. Name of the region indicator variable used to split data into training (region_var == 0) and testing (region_var == 1) populations. Default: "z_regA" |
sg_focus |
Character. Subgroup selection criterion passed to
|
maxk |
Integer. Maximum number of factors in subgroup combinations (1 or 2). Default: 1 |
hr.threshold |
Numeric. Hazard ratio threshold for subgroup identification. Default: 0.90 |
hr.consistency |
Numeric. Consistency threshold for hazard ratio. Default: 0.80 |
pconsistency.threshold |
Numeric. Probability threshold for consistency. Default: 0.90 |
confounders.name |
Character vector. Confounder variable names for ForestSearch. If NULL, automatically extracted from dgm |
conf_force |
Character vector. Forced cuts to consider in ForestSearch. Default: c("z_age <= 65", "z_bm <= 0", "z_bm <= 1", "z_bm <= 2", "z_bm <= 5") |
fs_args |
Named list. Additional arguments passed directly to
|
sim_args |
Named list. Additional arguments passed to
|
analysis_time |
Numeric. Time of analysis for administrative censoring. Default: 60 |
cens_adjust |
Numeric. Adjustment factor for censoring rate on log scale. Default: 0 |
parallel_args |
List. Parallel processing configuration with components:
|
details |
Logical. Print detailed progress information. Default: FALSE |
verbose_n_sims |
Integer. When |
seed |
Integer. Base random seed for reproducibility. Default: NULL |
Details
Simulation Process
For each simulation:
Sample from super-population using
simulate_from_dgmSplit by region_var into training and testing populations
Estimate HRs in ITT, training, and testing populations
Run
forestsearchon training populationApply identified subgroup to testing population
Calculate subgroup-specific estimates
Region Variable
The region_var parameter is used ONLY for splitting data into training/testing
populations. It does not imply any prognostic effect. To include prognostic
confounder effects, specify them when creating the DGM using
create_dgm_for_mrct or generate_aft_dgm_flex.
Value
A data.table with simulation results containing:
- sim
Simulation index
- n_itt
ITT sample size
- hr_itt
ITT hazard ratio (stratified if strat variable present)
- hr_ittX
ITT hazard ratio stratified by region
- n_train
Training (non-region A) sample size
- hr_train
Training population hazard ratio
- n_test
Testing (region A) sample size
- hr_test
Testing population hazard ratio
- any_found
Indicator: 1 if subgroup identified, 0 otherwise
- sg_found
Character description of identified subgroup
- n_sg
Subgroup sample size
- hr_sg
Subgroup hazard ratio in testing population
- POhr_sg
Potential outcome hazard ratio in subgroup (testing)
- prev_sg
Subgroup prevalence (proportion of testing population)
- n_sg_train
Subgroup sample size in training population
- hr_sg_train
Subgroup hazard ratio in training population
- POhr_sg_train
Potential outcome hazard ratio in subgroup (training)
- hr_sg_null
Subgroup HR when found, NA otherwise
See Also
forestsearch for subgroup identification algorithm
generate_aft_dgm_flex for DGM creation
simulate_from_dgm for data simulation
create_dgm_for_mrct for MRCT-specific DGM wrapper
summaryout_mrct for summarizing simulation results
Calculate n and percent
Description
Returns count and percent for a vector relative to a denominator.
Usage
n_pcnt(x, denom)
Arguments
x |
Vector of values. |
denom |
Denominator for percent calculation. |
Value
Character string formatted as \"n (percent%)\".
Parse sg.harm Factor Names to Expression
Description
Converts ForestSearch factor names (e.g., "er.0", "grade3.1") into human-readable R expressions (e.g., "er <= 0", "grade3 == 1").
Usage
parse_sg_harm_to_expression(sg_harm, fs.est = NULL)
Arguments
sg_harm |
Character vector of factor names from fs.est$sg.harm. |
fs.est |
ForestSearch object (for accessing confs_labels if available). |
Value
Character string expression or NULL if parsing fails.
Plot ForestSearch Results
Description
Dispatches to plot_sg_results for Kaplan-Meier curves,
hazard-ratio forest plots, or combined panels.
Usage
## S3 method for class 'forestsearch'
plot(
x,
type = c("combined", "km", "forest", "summary"),
outcome.name = "Y",
event.name = "Event",
treat.name = "Treat",
...
)
Arguments
x |
A |
type |
Character. Type of plot:
|
outcome.name |
Character. Name of time-to-event column.
Default: |
event.name |
Character. Name of event indicator column.
Default: |
treat.name |
Character. Name of treatment column.
Default: |
... |
Additional arguments passed to |
Value
Invisibly returns the plot result from
plot_sg_results.
See Also
plot_sg_results for full control over appearance,
plot_sg_weighted_km for weighted KM curves,
plot_subgroup_results_forestplot for publication-ready
forest plots.
Plot Method for ForestSearch Forest Plot
Description
Plot Method for ForestSearch Forest Plot
Usage
## S3 method for class 'fs_forestplot'
plot(x, ...)
Arguments
x |
An fs_forestplot object |
... |
Additional arguments (ignored) |
Value
Invisibly returns x.
Plot Method for fs_sg_plot Objects
Description
Plot Method for fs_sg_plot Objects
Usage
## S3 method for class 'fs_sg_plot'
plot(x, which = 1, ...)
Arguments
x |
An fs_sg_plot object |
which |
Character or integer. Which plot to display. Default: 1 (first available) |
... |
Additional arguments passed to plot functions |
Value
Invisibly returns x.
Plot Detection Probability Curve
Description
Creates a visualization of the detection probability curve.
Usage
plot_detection_curve(
curve_data,
add_reference_lines = TRUE,
add_threshold_line = TRUE,
title = NULL,
...
)
Arguments
curve_data |
A data.frame from |
add_reference_lines |
Logical. Add horizontal reference lines at 0.05, 0.10, 0.80. Default: TRUE |
add_threshold_line |
Logical. Add vertical line at hr_threshold. Default: TRUE |
title |
Character. Plot title. Default: auto-generated |
... |
Additional arguments passed to plot() |
Value
Invisibly returns the input data.
Plot Kaplan-Meier Survival Difference Bands for ForestSearch Subgroups
Description
Creates Kaplan-Meier survival difference band plots comparing the identified
ForestSearch subgroup (sg.harm) and its complement against the ITT population.
This function wraps plotKM.band_subgroups() from the weightedsurv
package, automatically extracting subgroup definitions from ForestSearch
results.
Usage
plot_km_band_forestsearch(
df,
fs.est = NULL,
sg_cols = NULL,
sg_labels = NULL,
sg_colors = NULL,
itt_color = "azure3",
outcome.name = "tte",
event.name = "event",
treat.name = "treat",
xlabel = "Time",
ylabel = "Survival differences",
yseq_length = 5,
draws_band = 1000,
tau_add = NULL,
by_risk = 6,
risk_cex = 0.75,
risk_delta = 0.035,
risk_pad = 0.015,
ymax_pad = 0.11,
show_legend = TRUE,
legend_pos = "topleft",
legend_cex = 0.75,
ref_subgroups = NULL,
verbose = FALSE
)
Arguments
df |
Data frame. The analysis dataset containing all required variables including subgroup indicator columns. |
fs.est |
A forestsearch object containing the identified subgroup,
or |
sg_cols |
Character vector. Names of columns in |
sg_labels |
Character vector. Subsetting expressions for each subgroup,
corresponding to |
sg_colors |
Character vector. Colors for each subgroup curve,
corresponding to |
itt_color |
Character. Color for ITT population band.
Default: |
outcome.name |
Character. Name of time-to-event column.
Default: |
event.name |
Character. Name of event indicator column.
Default: |
treat.name |
Character. Name of treatment column.
Default: |
xlabel |
Character. X-axis label. Default: |
ylabel |
Character. Y-axis label. Default: |
yseq_length |
Integer. Number of y-axis tick marks.
Default: |
draws_band |
Integer. Number of bootstrap draws for confidence band.
Default: |
tau_add |
Numeric. Time horizon for the plot. If |
by_risk |
Numeric. Interval for risk table. Default: |
risk_cex |
Numeric. Character expansion for risk table text.
Default: |
risk_delta |
Numeric. Vertical spacing for risk table.
Default: |
risk_pad |
Numeric. Padding for risk table. Default: |
ymax_pad |
Numeric. Y-axis maximum padding. Default: |
show_legend |
Logical. Whether to display the legend.
Default: |
legend_pos |
Character. Legend position (e.g., "topleft", "bottomright").
Default: |
legend_cex |
Numeric. Character expansion for legend text.
Default: |
ref_subgroups |
Named list. Optional additional reference subgroups to include. Each element should be a list with:
The function automatically creates indicator columns from the expressions.
Default: |
verbose |
Logical. Print diagnostic messages. Default: |
Details
This function simplifies the workflow of creating KM survival difference band plots for ForestSearch-identified subgroups. It can work in two modes:
Mode 1: With ForestSearch result (fs.est provided)
Extracts the subgroup definition from the ForestSearch result
Creates binary indicator columns (Qrecommend, Brecommend) in
dfGenerates appropriate labels from the subgroup definition
Calls
plotKM.band_subgroups()with configured parameters
Mode 2: With pre-defined columns (sg_cols provided)
Uses existing indicator columns in
dfRequires
sg_labelsandsg_colorsto matchsg_cols
The sg.harm subgroup (Qrecommend) represents patients with questionable
treatment benefit (where treat.recommend == 0 in ForestSearch output).
The complement (Brecommend) represents patients recommended for treatment.
Value
Invisibly returns a list containing:
- df
The modified data frame with subgroup indicators
- sg_cols
Character vector of subgroup column names used
- sg_labels
Character vector of subgroup labels used
- sg_colors
Character vector of colors used
- sg_harm_definition
The subgroup definition extracted from fs.est
- ref_subgroups
The reference subgroups list (if provided)
Subgroup Extraction
When fs.est is provided, the subgroup definition is extracted from:
-
fs.est$grp.consistency$out_sg$sg.harm_label- Human-readable labels -
fs.est$sg.harm- Technical factor names (fallback) -
fs.est$df.est$treat.recommend- Subgroup membership indicator
Note
This function requires the weightedsurv package, which can be
installed from GitHub: devtools::install_github("larry-leon/weightedsurv")
See Also
forestsearch for running the subgroup analysis
plot_sg_weighted_km for weighted KM plots
plot_sg_results for comprehensive subgroup visualization
Plot Distribution of Identified Subgroups
Description
Bar chart of subgroups identified across simulations, filtered to
those appearing in at least min_pct of the found simulations.
Usage
plot_sg_distribution(
results,
min_pct = 5,
title = "Distribution of Identified Subgroups",
wrap_width = 25
)
Arguments
results |
data.table from |
min_pct |
Numeric. Minimum percentage threshold for display (default: 5) |
title |
Character. Plot title. Default: "Distribution of Identified Subgroups" |
wrap_width |
Integer. Character width for wrapping long subgroup labels. Default: 25 |
Value
A ggplot2 object
Plot Forest Plot of Hazard Ratios
Description
Creates a forest plot showing hazard ratios with confidence intervals.
Usage
plot_sg_forest(hr_estimates, sg0_name, sg1_name, colors, title = NULL, ...)
Arguments
hr_estimates |
Data frame with HR estimates |
sg0_name |
Character. Label for H subgroup |
sg1_name |
Character. Label for Hc subgroup |
colors |
List. Color specifications |
title |
Character. Plot title |
... |
Additional arguments |
Value
Invisible NULL (creates plot as side effect)
Plot Kaplan-Meier Survival Curves for Subgroups
Description
Creates side-by-side Kaplan-Meier survival curves for the H and Hc subgroups.
Usage
plot_sg_km(
df_H,
df_Hc,
outcome.name,
event.name,
treat.name,
by.risk,
sg0_name,
sg1_name,
treat_labels,
colors,
show_ci = TRUE,
show_logrank = TRUE,
show_hr = TRUE,
hr_estimates = NULL,
conf.level = 0.95,
title = NULL,
...
)
Arguments
df_H |
Data frame for H subgroup |
df_Hc |
Data frame for Hc subgroup |
outcome.name |
Character. Outcome variable name |
event.name |
Character. Event indicator name |
treat.name |
Character. Treatment variable name |
by.risk |
Numeric. Risk table interval |
sg0_name |
Character. Label for H subgroup |
sg1_name |
Character. Label for Hc subgroup |
treat_labels |
Named character vector. Treatment labels |
colors |
List. Color specifications |
show_ci |
Logical. Show confidence intervals |
show_logrank |
Logical. Show log-rank p-value |
show_hr |
Logical. Show HR annotation |
hr_estimates |
Data frame with HR estimates |
conf.level |
Numeric. Confidence level |
title |
Character. Plot title |
... |
Additional arguments |
Value
Invisible NULL (creates plot as side effect)
Plot ForestSearch Subgroup Results
Description
Creates comprehensive visualizations of subgroup results from ForestSearch,
including Kaplan-Meier survival curves, hazard ratio comparisons, and
summary statistics. This function is designed to work with the output
from forestsearch, specifically the df.est component.
Usage
plot_sg_results(
fs.est,
outcome.name = "Y",
event.name = "Event",
treat.name = "Treat",
plot_type = c("combined", "km", "forest", "summary"),
by.risk = NULL,
conf.level = 0.95,
est.scale = c("hr", "1/hr"),
sg0_name = "Questionable (H)",
sg1_name = "Recommend (H^c)",
treat_labels = c(`0` = "Control", `1` = "Treatment"),
colors = NULL,
title = NULL,
show_events = TRUE,
show_ci = TRUE,
show_logrank = TRUE,
show_hr = TRUE,
verbose = FALSE,
...
)
Arguments
fs.est |
A forestsearch object or list containing at minimum:
|
outcome.name |
Character. Name of time-to-event outcome column. Default: "Y" |
event.name |
Character. Name of event indicator column (1=event, 0=censored). Default: "Event" |
treat.name |
Character. Name of treatment column (1=treatment, 0=control). Default: "Treat" |
plot_type |
Character. Type of plot to create. One of:
|
by.risk |
Numeric. Risk interval for KM survival curves. Default: NULL (auto-calculated) |
conf.level |
Numeric. Confidence level for intervals. Default: 0.95 |
est.scale |
Character. Effect scale: "hr" (hazard ratio) or "1/hr" (inverse). Default: "hr" |
sg0_name |
Character. Label for subgroup 0 (harm/questionable). Default: "Questionable (H)" |
sg1_name |
Character. Label for subgroup 1 (recommend/complement). Default: "Recommend (H^c)" |
treat_labels |
Named character vector. Labels for treatment arms. Default: c("0" = "Control", "1" = "Treatment") |
colors |
Named character vector. Colors for plot elements. Default: uses package defaults |
title |
Character. Main plot title. Default: auto-generated |
show_events |
Logical. Show event counts on KM curves. Default: TRUE |
show_ci |
Logical. Show confidence intervals. Default: TRUE |
show_logrank |
Logical. Show log-rank p-value. Default: TRUE |
show_hr |
Logical. Show hazard ratio annotation. Default: TRUE |
verbose |
Logical. Print diagnostic messages. Default: FALSE |
... |
Additional arguments passed to plotting functions. |
Details
The function extracts subgroup membership from fs.est$df.est$treat.recommend:
-
treat.recommend == 0: Harm/questionable subgroup (H) -
treat.recommend == 1: Recommend/complement subgroup (H^c)
For est.scale = "1/hr", treatment labels and subgroup interpretation
are reversed to maintain clinical interpretability.
Value
An object of class fs_sg_plot containing:
- plots
List of ggplot2 or base R plot objects
- summary
Data frame of subgroup summary statistics
- hr_estimates
Data frame of hazard ratio estimates
- call
The matched call
Kaplan-Meier Plots
When plot_type = "km", creates side-by-side survival curves for:
The identified subgroup (H) with treatment vs control
The complement subgroup (H^c) with treatment vs control
Forest Plot
When plot_type = "forest", creates a forest plot showing hazard
ratios with confidence intervals for: ITT population, H subgroup,
and H^c complement.
See Also
forestsearch for running the subgroup analysis
sg_consistency_out for consistency evaluation
plot_subgroup_results_forestplot for publication-ready forest plots
Plot Summary Statistics Panel
Description
Creates a summary panel with subgroup characteristics.
Usage
plot_sg_summary_panel(
summary_stats,
hr_estimates,
sg0_name,
sg1_name,
colors,
...
)
Arguments
summary_stats |
Data frame with summary statistics |
hr_estimates |
Data frame with HR estimates |
sg0_name |
Character. Label for H subgroup |
sg1_name |
Character. Label for Hc subgroup |
colors |
List. Color specifications |
... |
Additional arguments |
Value
Invisible NULL (creates plot as side effect)
Plot Weighted Kaplan-Meier Curves for ForestSearch Subgroups
Description
Creates weighted Kaplan-Meier survival curves for the identified subgroups
(H and Hc) using the weightedsurv package, matching the pattern used in
sg_consistency_out().
Usage
plot_sg_weighted_km(
fs.est,
fs_bc = NULL,
outcome.name = "Y",
event.name = "Event",
treat.name = "Treat",
by.risk = NULL,
sg0_name = NULL,
sg1_name = NULL,
conf.int = TRUE,
show.logrank = TRUE,
show.cox = TRUE,
show.cox.bc = TRUE,
put.legend.lr = "topleft",
ymax = 1.05,
xmed.fraction = 0.65,
hr_bc_position = "bottomright",
hr_bc_cex = 0.725,
title = NULL,
verbose = FALSE
)
Arguments
fs.est |
A forestsearch object containing |
fs_bc |
Optional. Bootstrap results from |
outcome.name |
Character. Name of time-to-event column.
Default: |
event.name |
Character. Name of event indicator column.
Default: |
treat.name |
Character. Name of treatment column.
Default: |
by.risk |
Numeric. Risk interval for plotting. Default: |
sg0_name |
Character. Label for H subgroup (treat.recommend == 0).
Default: |
sg1_name |
Character. Label for Hc subgroup (treat.recommend == 1).
Default: |
conf.int |
Logical. Show confidence intervals. Default: |
show.logrank |
Logical. Show log-rank test. Default: |
show.cox |
Logical. Show unadjusted Cox HR from weightedsurv.
Default: |
show.cox.bc |
Logical. Show bootstrap bias-corrected HR annotation
(requires |
put.legend.lr |
Character. Legend position. Default: "topleft" |
ymax |
Numeric. Max y-axis value. Default: 1.05 |
xmed.fraction |
Numeric. Fraction for median lines. Default: 0.65 |
hr_bc_position |
Character. Position for bias-corrected HR annotation. One of "bottomright", "bottomleft", "topright", "topleft". Default: "bottomright" |
hr_bc_cex |
Numeric. Character expansion factor for bias-corrected HR annotation text. Default: 0.725 (matches weightedsurv cox.cex default) |
title |
Character. Overall plot title. Default: |
verbose |
Logical. Print diagnostic messages. Default: |
Details
This function uses the exact same calling pattern as plot_subgroup()
in the ForestSearch package. Column names are mapped internally to the
standard names (Y, Event, Treat) expected by weightedsurv.
Subgroup definitions are automatically extracted from the forestsearch object if available:
-
fs$grp.consistency$out_sg$sg.harm_label- Human-readable labels -
fs$sg.harm- Technical factor names (fallback)
HR display options controlled by show.cox and show.cox.bc:
Both TRUE (default): Shows unadjusted HR from weightedsurv AND bias-corrected HR annotation
-
show.cox = TRUE, show.cox.bc = FALSE: Shows only unadjusted HR -
show.cox = FALSE, show.cox.bc = TRUE: Shows only bias-corrected HR Both FALSE: Shows neither HR estimate
Value
Invisibly returns a list with subgroup data frames and counting data
Plot Spline Treatment Effect Function
Description
Plot Spline Treatment Effect Function
Usage
plot_spline_treatment_effect(dgm_result, add_points = TRUE)
Arguments
dgm_result |
Result object from generate_aft_dgm_flex with spline |
add_points |
Logical; add observed data points. Default TRUE |
Value
No return value, called for side effects (produces a plot).
Plot Subgroup Survival Curves
Description
Plots weighted Kaplan-Meier survival curves for a specified subgroup and its complement using the weightedsurv package.
Usage
plot_subgroup(df.sub, df.subC, by.risk, confs_labels, this.1_label, top_result)
Arguments
df.sub |
A data frame containing data for the subgroup of interest. |
df.subC |
A data frame containing data for the complement subgroup. |
by.risk |
Numeric. The risk interval for plotting (passed to |
confs_labels |
Named character vector. Covariate label mapping (not used directly in this function, but may be used for labeling). |
this.1_label |
Character. Label for the subgroup being plotted. |
top_result |
Data frame row. The top subgroup result row, expected to contain a |
Plot Subgroup Analysis Results
Description
Creates diagnostic plots for subgroup treatment effects from df_super object
Usage
plot_subgroup_effects(
df_super,
z,
hrz_crit = 0,
log.hrs = NULL,
ahr_empirical = NULL,
plot_type = c("both", "profile", "ahr"),
add_rug = TRUE,
zpoints_by = 1,
...
)
Arguments
df_super |
A data frame containing subgroup analysis results with columns: loghr_po (log hazard ratios), and optionally theta_1 and theta_0 (treatment-specific parameters) |
z |
Character string specifying the column name to use as the subgroup score (e.g., "z_age", "z_size", "subgroup"). Required. |
hrz_crit |
Critical hazard ratio threshold for defining optimal subgroup. Default is 1 (HR=1 on log scale is 0). |
log.hrs |
Optional vector of reference log hazard ratios to display as horizontal lines. Default is NULL. |
ahr_empirical |
Optional empirical average hazard ratio to display. If NULL, calculated from data. Default is NULL. |
plot_type |
Character string specifying plot type: "both" (default), "profile", or "ahr". |
add_rug |
Logical indicating whether to add rug plot of z values. Default is TRUE. |
zpoints_by |
Step size for z-axis grid when calculating AHR curves. Default is 1. |
... |
Additional graphical parameters passed to plot() |
Details
The function creates up to two plots:
Treatment effect profile: Shows log hazard ratio as function of z
Average hazard ratio curve: Shows AHR for subgroups z >= threshold
The "optimal" subgroup is defined as patients with z >= cut.zero, where cut.zero is the minimum z value with favorable treatment effect (loghr < hrz_crit).
Value
A list containing:
cut.zero |
The minimum z value where loghr_po < hrz_crit |
AHR_opt |
Average hazard ratio for optimal subgroup (z >= cut.zero) |
zpoints |
Grid of z values used for AHR calculations |
HR.zpoints |
AHR for population with z >= zpoints |
HRminus.zpoints |
AHR for population with z <= zpoints |
HR2.zpoints |
Alternative AHR calculation for z >= zpoints |
HRminus2.zpoints |
Alternative AHR calculation for z <= zpoints |
Plot Subgroup Results Forest Plot
Description
Generates a comprehensive forest plot showing:
ITT (Intent-to-Treat) population estimate
Reference subgroups (e.g., by biomarker levels)
Post-hoc identified subgroups with bias-corrected estimates
Cross-validation agreement metrics as annotations
Usage
plot_subgroup_results_forestplot(
fs_results,
df_analysis,
subgroup_list = NULL,
outcome.name,
event.name,
treat.name,
E.name = "Experimental",
C.name = "Control",
est.scale = "hr",
xlog = TRUE,
title_text = NULL,
arrow_text = c("Favors Experimental", "Favors Control"),
footnote_text = c("Eg 80% of training found SG: 70% of B (+) also B in CV testing"),
xlim = c(0.25, 1.5),
ticks_at = c(0.25, 0.7, 1, 1.5),
show_cv_metrics = TRUE,
cv_source = c("auto", "kfold", "oob", "both"),
posthoc_colors = c("powderblue", "beige"),
reference_colors = c("yellow", "powderblue"),
ci_column_spaces = 20,
conf.level = 0.95,
theme = NULL
)
Arguments
fs_results |
List. A list containing ForestSearch analysis results with elements:
|
df_analysis |
Data frame. The analysis dataset with outcome, event, and treatment variables. |
subgroup_list |
List. Named list of subgroup definitions to include in the plot. Each element should be a list with:
|
outcome.name |
Character. Name of the survival time variable. |
event.name |
Character. Name of the event indicator variable. |
treat.name |
Character. Name of the treatment variable. |
E.name |
Character. Label for experimental arm (default: "Experimental"). |
C.name |
Character. Label for control arm (default: "Control"). |
est.scale |
Character. Estimate scale: "hr" or "1/hr" (default: "hr"). |
xlog |
Logical. If TRUE (default), the x-axis is plotted on a logarithmic scale. This is standard for hazard ratio forest plots where equal distances represent equal relative effects. |
title_text |
Character. Plot title (default: NULL). |
arrow_text |
Character vector of length 2. Arrow labels for forest plot (default: c("Favors Experimental", "Favors Control")). |
footnote_text |
Character vector. Footnote text for the plot explaining CV metrics (default provides CV interpretation guidance; set to NULL to omit). |
xlim |
Numeric vector of length 2. X-axis limits (default: c(0.25, 1.5)). |
ticks_at |
Numeric vector. X-axis tick positions (default: c(0.25, 0.70, 1.0, 1.5)). |
show_cv_metrics |
Logical. Whether to show cross-validation metrics (default: TRUE if fs_kfold or fs_OOB available). |
cv_source |
Character. Source for CV metrics: "auto" (default, uses both if available, otherwise whichever is present), "kfold" (use fs_kfold only), "oob" (use fs_OOB only), or "both" (explicitly use both fs_kfold and fs_OOB, with K-fold first then OOB). |
posthoc_colors |
Character vector. Colors for post-hoc subgroup rows (default: c("powderblue", "beige")). |
reference_colors |
Character vector. Colors for reference subgroup rows (default: c("yellow", "powderblue")). |
ci_column_spaces |
Integer. Number of spaces for the CI plot column width. More spaces = wider CI column (default: 20). |
conf.level |
Numeric. Confidence level for intervals (default: 0.95 for 95% CI). Used to calculate the z-multiplier as qnorm(1 - (1 - conf.level)/2). |
theme |
An fs_forest_theme object from |
Details
Creates a publication-ready forest plot displaying identified subgroups with hazard ratios, bias-corrected estimates, and cross-validation metrics. This wrapper integrates ForestSearch results with the forestploter package.
ForestSearch Labeling Convention
ForestSearch identifies subgroups based on hazard ratio thresholds:
-
sg.harm: Contains the definition of the "harm" or "questionable" subgroup (H) -
treat.recommend == 0: Patient is IN the harm subgroup (H) -
treat.recommend == 1: Patient is in the COMPLEMENT subgroup (Hc, typically benefit)
For est.scale = "hr" (searching for harm):
H (treat.recommend=0): Subgroup defined by sg.harm with elevated HR (harm/questionable)
Hc (treat.recommend=1): Complement of sg.harm (potential benefit)
For est.scale = "1/hr" (searching for benefit):
Roles are reversed: H becomes the benefit group
Value
A list containing:
- plot
The forestploter grob object (can be rendered with plot())
- data
The data frame used for the forest plot
- row_types
Character vector of row types for styling reference
- cv_metrics
Cross-validation metrics text (if available)
See Also
forestsearch for running the subgroup analysis
forestsearch_bootstrap_dofuture for bootstrap bias correction
forestsearch_Kfold for cross-validation
create_forest_theme for customizing plot appearance
render_forestplot for rendering the plot
Prepare Censoring Model Parameters
Description
Constructs the censoring model object and appends per-subject counterfactual
censoring linear predictors (lin_pred_cens_0, lin_pred_cens_1)
to the super-population data frame.
Usage
prepare_censoring_model(
df_work,
cens_type,
cens_params,
df_super,
select_censoring = TRUE,
verbose = TRUE
)
Arguments
df_work |
Working data frame (output of |
cens_type |
Character. |
cens_params |
Named list of user-supplied censoring parameters. |
df_super |
Super-population data frame; receives
|
select_censoring |
Logical. If |
verbose |
Logical. If |
Details
Linear predictor convention
lin_pred_cens_0 and lin_pred_cens_1 store the
covariate contribution only — i.e. \gamma_c' X, with the
intercept \mu_c excluded. This matches the convention used for the
outcome model (lin_pred_0, lin_pred_1 = \gamma' X,
no intercept) computed in calculate_linear_predictors().
simulate_from_dgm() reconstructs the full log-censoring time as:
\log C = \mu_c + \delta + \tau_c \epsilon + \gamma_c' X
where \mu_c = params$censoring$mu,
\delta = cens_adjust,
\tau_c = params$censoring$tau, and
\gamma_c' X = lin_pred_cens_{0|1}.
When select_censoring = TRUE, predict(survreg, type = "linear")
returns the full linear predictor \mu_c + \gamma_c' X. The stored
intercept \mu_c is therefore subtracted before writing
lin_pred_cens_*, so that simulate_from_dgm() can add
params$censoring$mu exactly once. Omitting this subtraction causes
\mu_c to be counted twice, producing astronomically large censoring
times and universal censoring.
When select_censoring = FALSE with a Weibull/lognormal
cens_type, the intercept-only model has zero covariate contribution,
so lin_pred_cens_0 = lin_pred_cens_1 = 0. Storing mu instead
of 0 causes the same double-counting.
Value
A named list:
- cens_model
List of censoring distribution parameters stored in
dgm$model_params$censoring.- df_super
Updated super-population data frame with
lin_pred_cens_0andlin_pred_cens_1appended. These hold covariate contributions only (\gamma_c' X); the intercept is excluded.
Prepare Data for Subgroup Search
Description
Cleans data by removing missing values and extracting components
Usage
prepare_search_data(Y, Event, Treat, Z)
Prepare subgroup data for analysis
Description
Splits a data frame into two subgroups based on a flag and treatment scale.
Usage
prepare_subgroup_data(df, SG_flag, est.scale, treat.name)
Arguments
df |
Data frame. |
SG_flag |
Character. Name of subgroup flag variable. |
est.scale |
Character. Effect scale ("hr" or "1/hr"). |
treat.name |
Character. Name of treatment variable. |
Value
List with subgroup data frames and treatment variable name.
Prepare Working Dataset with Processed Covariates
Description
Prepare Working Dataset with Processed Covariates
Usage
prepare_working_dataset(
data,
outcome_var,
event_var,
treatment_var,
continuous_vars,
factor_vars,
standardize,
continuous_vars_cens,
factor_vars_cens,
verbose
)
Print method for cox_ahr_cde objects
Description
Print method for cox_ahr_cde objects
Usage
## S3 method for class 'cox_ahr_cde'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (not used). |
Value
Invisibly returns the input object.
Print Method for forestsearch Objects
Description
Displays a concise summary of ForestSearch results including the identified subgroup definition, consistency metrics, algorithm details, and computation time.
Usage
## S3 method for class 'forestsearch'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (currently unused). |
Value
Invisibly returns x.
See Also
summary.forestsearch for detailed output,
plot.forestsearch for visualization.
Print Method for ForestSearch Forest Theme
Description
Print Method for ForestSearch Forest Theme
Usage
## S3 method for class 'fs_forest_theme'
print(x, ...)
Arguments
x |
An fs_forest_theme object |
... |
Additional arguments (ignored) |
Value
Invisibly returns x.
Print Method for ForestSearch Forest Plot
Description
Print Method for ForestSearch Forest Plot
Usage
## S3 method for class 'fs_forestplot'
print(x, ...)
Arguments
x |
An fs_forestplot object |
... |
Additional arguments (ignored) |
Value
Invisibly returns x.
Print Method for K-Fold CV Results
Description
Print Method for K-Fold CV Results
Usage
## S3 method for class 'fs_kfold'
print(x, ...)
Arguments
x |
An fs_kfold object |
... |
Additional arguments (ignored) |
Value
Invisibly returns x.
Print Method for fs_sg_plot Objects
Description
Print Method for fs_sg_plot Objects
Usage
## S3 method for class 'fs_sg_plot'
print(x, ...)
Arguments
x |
An fs_sg_plot object |
... |
Additional arguments (unused) |
Value
Invisibly returns x.
Print Method for Repeated K-Fold CV Results
Description
Print Method for Repeated K-Fold CV Results
Usage
## S3 method for class 'fs_tenfold'
print(x, ...)
Arguments
x |
An fs_tenfold object |
... |
Additional arguments (ignored) |
Value
Invisibly returns x.
Print Method for fs_weighted_km Objects
Description
Print Method for fs_weighted_km Objects
Usage
## S3 method for class 'fs_weighted_km'
print(x, ...)
Arguments
x |
An fs_weighted_km object from plot_sg_weighted_km() |
... |
Additional arguments (unused) |
Value
Invisibly returns x.
Print Method for gbsg_dgm Objects
Description
Print Method for gbsg_dgm Objects
Usage
## S3 method for class 'gbsg_dgm'
print(x, ...)
Arguments
x |
A gbsg_dgm object |
... |
Additional arguments (unused) |
Value
Invisibly returns x.
Examples
dgm <- setup_gbsg_dgm(model = "alt", verbose = FALSE)
print(dgm)
Print method for survreg_comparison objects
Description
Print method for survreg_comparison objects
Usage
## S3 method for class 'multi_survreg_comparison'
print(x, ...)
Arguments
x |
A survreg_comparison object |
... |
Additional arguments (not used) |
Value
Invisibly returns the input object
Print CV ForestSearch Parameters
Description
Print CV ForestSearch Parameters
Usage
print_cv_params(cv_args)
Print detailed output for debugging
Description
Displays detailed information about the GRF analysis
Usage
print_grf_details(config, values, best_subgroup, sg_harm_id, tree_cuts = NULL)
Arguments
config |
List. GRF configuration |
values |
Data frame. Node metrics |
best_subgroup |
Data frame row. Selected subgroup (or NULL) |
sg_harm_id |
Character. Subgroup definition (or NULL) |
tree_cuts |
List. Cut information |
Value
No return value, called for side effects (prints GRF diagnostic information to the console).
Process forced cut expression for a variable
Description
Evaluates a cut expression (e.g., "age <= mean(age)") and returns the expression with the value.
Usage
process_conf_force_expr(expr, df)
Arguments
expr |
Character string of the cut expression. |
df |
Data frame. |
Value
Character string with evaluated value.
Process Continuous Variable for Subgroup Definition
Description
Process Continuous Variable for Subgroup Definition
Usage
process_continuous_subgroup(var_data, cut_spec, var_name, verbose)
Process Continuous Variables
Description
Process Continuous Variables
Usage
process_continuous_vars(
df_work,
data,
continuous_vars,
standardize,
marker = "z_"
)
Process Cutpoint Specification for Subgroup Definition
Description
Process Cutpoint Specification for Subgroup Definition
Usage
process_cutpoint(var_data, cut_spec, var_name = "", verbose = FALSE)
Process Factor Variable for Subgroup Definition
Description
Process Factor Variable for Subgroup Definition
Usage
process_factor_subgroup(var_data, cut_spec, var_name, verbose)
Process Factor Variables with Largest Value as Reference
Description
Process Factor Variables with Largest Value as Reference
Usage
process_factor_vars(df_work, data, factor_vars, marker = "z_")
75th Percentile (Quantile High)
Description
Returns the 75th percentile of a numeric vector.
Usage
qhigh(x)
Arguments
x |
A numeric vector. |
Value
Numeric value of the 75th percentile.
25th Percentile (Quantile Low)
Description
Returns the 25th percentile of a numeric vector.
Usage
qlow(x)
Arguments
x |
A numeric vector. |
Value
Numeric value of the 25th percentile.
Quick Plot KM Bands from ForestSearch
Description
Convenience wrapper with sensible defaults for quick visualization.
Usage
quick_km_band_plot(df, fs.est, outcome.name, event.name, treat.name, ...)
Arguments
df |
Data frame with analysis data. |
fs.est |
ForestSearch result object. |
outcome.name |
Character. Time-to-event column name. |
event.name |
Character. Event indicator column name. |
treat.name |
Character. Treatment column name. |
... |
Additional arguments passed to |
Value
Invisibly returns the plot result.
Remove Near-Duplicate Subgroups
Description
Removes subgroups with nearly identical statistics (HR, n, E, etc.) to reduce redundancy in candidate list.
Usage
remove_near_duplicate_subgroups(
hr_subgroups,
tolerance = 0.001,
details = FALSE
)
Arguments
hr_subgroups |
Data.table of subgroup results. |
tolerance |
Numeric. Tolerance for numeric comparison (default 0.001). |
details |
Logical. Print details about removed duplicates. |
Value
Data.table with near-duplicate rows removed.
Remove Redundant Subgroups
Description
Removes redundant subgroups by checking for exact ties in key columns.
Usage
remove_redundant_subgroups(found.hrs)
Arguments
found.hrs |
Data.table of found subgroups. |
Value
Data.table of non-redundant subgroups.
Render ForestSearch Forest Plot
Description
Renders a forest plot from plot_subgroup_results_forestplot().
Usage
render_forestplot(x, newpage = TRUE)
Arguments
x |
An fs_forestplot object from |
newpage |
Logical. Call grid.newpage() before drawing. Default: TRUE. |
Details
To control plot sizing, create a custom theme using create_forest_theme()
and pass it to plot_subgroup_results_forestplot():
my_theme <- create_forest_theme(base_size = 14, row_padding = c(6, 4))
result <- plot_subgroup_results_forestplot(..., theme = my_theme)
render_forestplot(result)
Value
Invisibly returns the grob object.
Render Reference Simulation Table as gt
Description
Converts a data frame of pre-computed reference simulation results (e.g.,
digitized from a published LaTeX table) into a styled gt table. This
is useful for displaying published benchmark results alongside new
simulation output within vignettes or reports.
Usage
render_reference_table(
ref_df,
title = "Reference Simulation Results",
subtitle = NULL,
bold_threshold = 0.05
)
Arguments
ref_df |
|
title |
Character. Table title. |
subtitle |
Character. Table subtitle. Default: |
bold_threshold |
Numeric. Values in |
Value
A gt table object.
Examples
ref <- data.frame(
Scenario = "M1 Null: N=700",
Metric = "any(H)",
FS = 0.02,
FSlg = 0.03,
GRF = 0.25
)
render_reference_table(ref, title = "Reference Results")
Resolve parallel processing arguments for bootstrap
Description
If parallel_args not provided, falls back to forestsearch call's parallel configuration. Always reports configuration to user.
Usage
resolve_bootstrap_parallel_args(parallel_args, forestsearch_call_args)
Arguments
parallel_args |
List or empty list |
forestsearch_call_args |
List from original forestsearch call |
Value
List with plan, workers, show_message
Resolve Parallel Arguments for Cross-Validation
Description
Helper function to resolve and validate parallel processing arguments,
similar to bootstrap's resolve_bootstrap_parallel_args.
Usage
resolve_cv_parallel_args(parallel_args, fs_args, details = FALSE)
Arguments
parallel_args |
List. User-provided parallel arguments. |
fs_args |
List. Original ForestSearch call arguments. |
details |
Logical. Print configuration messages. |
Value
List with resolved plan, workers, show_message.
RMST calculation for subgroup
Description
Calculates restricted mean survival time (RMST) for a subgroup.
Usage
rmst_calculation(
df,
tte.name = "tte",
event.name = "event",
treat.name = "treat"
)
Arguments
df |
Data frame. |
tte.name |
Character. Name of time-to-event variable. |
event.name |
Character. Name of event indicator variable. |
treat.name |
Character. Name of treatment variable. |
Value
List with tau, RMST, RMST for treatment, RMST for control.
Run ForestSearch Analysis
Description
Helper function to run ForestSearch and extract estimates. Aligned with forestsearch() parameters including use_twostage.
Usage
run_forestsearch_analysis(
data,
confounders_name,
params,
dgm,
cox_formula = NULL,
cox_formula_adj = NULL,
analysis_label = "FS",
verbose = FALSE
)
Arguments
data |
Data frame with simulated trial data |
confounders_name |
Character vector of confounder names |
params |
List of ForestSearch parameters |
dgm |
DGM object for computing true HRs |
cox_formula |
Cox formula for estimation |
cox_formula_adj |
Adjusted Cox formula |
analysis_label |
Character label for this analysis |
verbose |
Print details |
Value
data.table with analysis estimates
Run GRF Analysis
Description
Helper function to run standalone GRF analysis using grf.subg.harm.survival().
Usage
run_grf_analysis(
data,
confounders_name,
params,
dgm,
cox_formula = NULL,
cox_formula_adj = NULL,
analysis_label = "GRF",
verbose = FALSE,
debug = FALSE
)
Arguments
data |
Data frame with simulated trial data |
confounders_name |
Character vector of confounder names |
params |
List of GRF parameters (from grf_merged) |
dgm |
DGM object for computing true HRs |
cox_formula |
Cox formula for estimation |
cox_formula_adj |
Adjusted Cox formula |
analysis_label |
Character label for this analysis |
verbose |
Print details |
debug |
Print detailed debugging information |
Value
data.table with analysis estimates
Run One Simulation Replicate
Description
General replacement for the legacy run_simulation_analysis() that
was coupled to simulate_from_gbsg_dgm() and GBSG-specific column
names. This version calls simulate_from_dgm and accepts
explicit column-name parameters, making it applicable to any DGM built
with generate_aft_dgm_flex.
Usage
run_simulation_analysis(
sim_id,
dgm,
n_sample,
analysis_time = Inf,
cens_adjust = 0,
max_follow = NULL,
muC_adj = NULL,
confounders_base = c("v1", "v2", "v3", "v4", "v5", "v6", "v7"),
n_add_noise = 0L,
outcome_name = "y_sim",
event_name = "event_sim",
treat_name = "treat_sim",
harm_col = "flag_harm",
run_fs = TRUE,
run_fs_grf = TRUE,
run_grf = TRUE,
fs_params = list(),
grf_params = list(),
cox_formula = NULL,
cox_formula_adj = NULL,
n_sims_total = NULL,
seed_base = 8316951L,
verbose = FALSE,
verbose_n = NULL,
debug = FALSE
)
Arguments
sim_id |
Integer. Simulation replicate index (used as seed offset). |
dgm |
An |
n_sample |
Integer. Per-replicate sample size. |
analysis_time |
Numeric. Calendar time of analysis on the DGM time
scale. Use |
cens_adjust |
Numeric. Log-scale shift to censoring times passed to
|
max_follow |
Deprecated. Use |
muC_adj |
Deprecated. Use |
confounders_base |
Character vector of base confounder names. |
n_add_noise |
Integer. Number of independent N(0,1) noise variables
to append. Default |
outcome_name |
Name of the observed time column in simulated data.
Default |
event_name |
Name of the event indicator column. Default
|
treat_name |
Name of the treatment column. Default |
harm_col |
Name of the true-subgroup indicator column. Default
|
run_fs |
Logical. Run ForestSearch (LASSO). Default |
run_fs_grf |
Logical. Run ForestSearch (LASSO + GRF). Default
|
run_grf |
Logical. Run standalone GRF. Default |
fs_params |
Named list of ForestSearch parameter overrides. |
grf_params |
Named list of GRF parameter overrides. |
cox_formula |
Optional Cox formula for unadjusted ITT. |
cox_formula_adj |
Optional adjusted Cox formula. |
n_sims_total |
Integer. Total simulations (for progress messages). |
seed_base |
Integer. Base seed; replicate seed = |
verbose |
Logical. Print progress messages. Default |
verbose_n |
Integer. If set, only print for |
debug |
Logical. Print detailed debug output. Default |
Value
A data.table with one row per analysis method, containing
subgroup size, HR, AHR, CDE, and classification metrics.
See Also
simulate_from_dgm,
generate_aft_dgm_flex, setup_gbsg_dgm
Run Single Consistency Split
Description
Performs one random 50/50 split and evaluates whether both halves meet the HR consistency threshold.
Usage
run_single_consistency_split(df.x, N.x, hr.consistency, cox_init = 0)
Arguments
df.x |
data.table. Subgroup data with columns Y, Event, Treat. |
N.x |
Integer. Number of observations in subgroup. |
hr.consistency |
Numeric. Minimum HR threshold for consistency. |
cox_init |
Numeric. Initial value for Cox model (log HR). |
Value
Numeric. 1 if both splits meet threshold, 0 if not, NA if error.
Evaluate an expression string in a data-frame scope
Description
Parses and evaluates expr in a restricted environment
containing only the columns of df (parent: baseenv()).
This isolates evaluation from the global environment, reducing
scope for unintended side effects.
Usage
safe_eval_expr(df, expr)
Arguments
df |
Data frame providing column names as variables. |
expr |
Character. Expression to evaluate
(e.g., |
Value
Result of evaluating expr, or NULL on failure.
Note
eval(parse()) is used intentionally here.
evaluate_comparison handles only single comparisons
(e.g., "er <= 0"); this function is needed for the compound
logical expressions produced by the ForestSearch subgroup enumeration
algorithm (e.g., "er <= 0 & nodes > 3"). Evaluation is
sandboxed: the environment contains only the columns of df
with baseenv() as parent, so neither the global environment
nor any package namespace is in scope. No user-supplied strings are
evaluated; only internally-constructed subgroup definition strings
reach this function.
See Also
evaluate_comparison for the single-comparison
operator-dispatch alternative that avoids eval(parse()).
Subset a data frame using an expression string
Description
Thin wrapper around safe_eval_expr that uses the
logical result to subset rows.
Usage
safe_subset(df, expr)
Arguments
df |
Data frame. |
expr |
Character. Subset expression
(e.g., |
Value
Subset of df, or NULL on failure.
Save ForestSearch Forest Plot to File
Description
Saves a forest plot to a file (PDF, PNG, etc.) with explicit dimensions.
Usage
save_forestplot(x, filename, width = 12, height = 10, dpi = 300, bg = "white")
Arguments
x |
An fs_forestplot object. |
filename |
Character. Output filename. Extension determines format. |
width |
Numeric. Plot width in inches. Default: 12. |
height |
Numeric. Plot height in inches. Default: 10. |
dpi |
Numeric. Resolution for raster formats. Default: 300. |
bg |
Character. Background color. Default: "white". |
Value
Invisibly returns the filename.
Select best subgroup based on criterion
Description
Identifies the optimal subgroup according to the specified criterion
Usage
select_best_subgroup(values, sg.criterion, dmin.grf, n.max)
Arguments
values |
Data frame. Node metrics from policy trees |
sg.criterion |
Character. "mDiff" for maximum difference, "Nsg" for largest size |
dmin.grf |
Numeric. Minimum difference threshold |
n.max |
Integer. Maximum allowed subgroup size (total sample size) |
Value
Data frame row with best subgroup or NULL if none found
Examples
vals <- data.frame(diff = c(8.5, 6.2, 3.1), Nsg = c(120, 95, 80))
select_best_subgroup(values = vals, sg.criterion = "mDiff",
dmin.grf = 6, n.max = 500)
Generate Cross-Validation Sensitivity Text
Description
Creates formatted text summarizing cross-validation agreement metrics.
Usage
sens_text(fs_kfold, est.scale = "hr")
Arguments
fs_kfold |
K-fold cross-validation results from forestsearch_Kfold. |
est.scale |
Character. "hr" or "1/hr". |
Value
Character string with formatted CV metrics.
Sensitivity Analysis of Hazard Ratios to k_inter
Description
Analyzes how the interaction parameter k_inter affects hazard ratios in different populations (overall, harm subgroup, no-harm subgroup).
Usage
sensitivity_analysis_k_inter(
k_inter_range = c(-5, 5),
n_points = 21,
plot = TRUE,
...
)
Arguments
k_inter_range |
Numeric vector of length 2 specifying the range of k_inter values to analyze. Default is c(-5, 5). |
n_points |
Integer number of points to evaluate within the range. Default is 21. |
plot |
Logical indicating whether to create visualization plots. Default is TRUE. |
... |
Additional arguments passed to |
Details
This function evaluates the hazard ratios at evenly spaced points across the k_inter range. If plot = TRUE, it creates a 4-panel visualization showing:
Harm subgroup HR vs k_inter
All HRs (overall, harm, no-harm) vs k_inter
Ratio of HRs (harm/no-harm) showing effect modification
Table of key values
Value
A data.frame of class "k_inter_sensitivity" with columns:
- k_inter
Numeric k_inter value
- hr_harm
Numeric hazard ratio in harm subgroup
- hr_no_harm
Numeric hazard ratio in no-harm subgroup
- hr_overall
Numeric overall hazard ratio
- subgroup_size
Integer size of harm subgroup
Set Up a GBSG-Based AFT Data Generating Mechanism
Description
Creates a GBSG-based data generating mechanism that is fully compatible with
simulate_from_dgm. This is the replacement for
create_gbsg_dgm(): it accepts exactly the same arguments and produces
the same numeric output, but returns an object of class
"aft_dgm_flex" instead of "gbsg_dgm".
Usage
setup_gbsg_dgm(
model = c("alt", "null"),
k_treat = 1,
k_inter = 1,
k_z3 = 1,
z1_quantile = 0.25,
n_super = 5000L,
cens_type = c("weibull", "uniform"),
use_rand_params = FALSE,
seed = 8316951L,
verbose = FALSE
)
Arguments
model |
Character. Either "alt" for alternative hypothesis with heterogeneous treatment effects, or "null" for uniform treatment effect. Default: "alt" |
k_treat |
Numeric. Treatment effect multiplier applied to the treatment coefficient from the fitted AFT model. Values > 1 strengthen the treatment effect. Default: 1 |
k_inter |
Numeric. Interaction effect multiplier for the treatment-subgroup interaction (z1 * z3). Only used when model = "alt". Higher values create more heterogeneity between HR(H) and HR(Hc). Default: 1 |
k_z3 |
Numeric. Effect multiplier for the z3 (menopausal status) coefficient. Default: 1 |
z1_quantile |
Numeric. Quantile threshold for z1 (estrogen receptor). Observations with ER <= quantile are coded as z1 = 1. Default: 0.25 |
n_super |
Integer. Size of super-population for empirical HR estimation. Default: 5000 |
cens_type |
Character. Censoring distribution type: "weibull" or "uniform". Default: "weibull" |
use_rand_params |
Logical. If TRUE, modifies confounder coefficients using estimates from randomized subset (meno == 0). Default: FALSE |
seed |
Integer. Random seed for super-population generation. Default: 8316951 |
verbose |
Logical. Print diagnostic information. Default: FALSE |
Details
Internally the function calls create_gbsg_dgm() and then:
Adds a
df_superfield with column names aligned tosimulate_from_dgm()conventions (lin_pred_1,lin_pred_0,lin_pred_cens_1,lin_pred_cens_0,flag_harm).Adds a
model_params$taufield (=model_params$sigma) and amodel_params$censoringsub-list.Sets class to
c("aft_dgm_flex", "gbsg_dgm", "list").
The original df_super_rand field is kept so that
compute_dgm_cde() and print.gbsg_dgm continue to work.
Value
An object of class c("aft_dgm_flex", "gbsg_dgm", "list")
with all fields from create_gbsg_dgm() plus:
df_superSuper-population data frame with
simulate_from_dgm()-compatible column names.model_params$tauCopy of
model_params$sigma.model_params$censoringSub-list with
type,mu,taufor the censoring model.
See Also
create_gbsg_dgm, simulate_from_dgm,
compute_dgm_cde
Examples
dgm <- setup_gbsg_dgm(model = "alt", k_inter = 2, verbose = FALSE)
dgm <- compute_dgm_cde(dgm)
print(dgm)
sim <- simulate_from_dgm(dgm, n = 400, seed = 1)
Set up parallel processing for subgroup consistency
Description
Sets up parallel processing using the specified approach and number of workers.
Usage
setup_parallel_SGcons(
parallel_args = list(plan = "multisession", workers = 4, show_message = TRUE)
)
Arguments
parallel_args |
List with |
Value
None. Sets up parallel backend as side effect.
Output Subgroup Consistency Results
Description
Returns the top subgroup(s) and recommended treatment flags.
Usage
sg_consistency_out(
df,
result_new,
sg_focus,
index.Z,
names.Z,
details = FALSE,
plot.sg = FALSE,
by.risk = 12,
confs_labels
)
Arguments
df |
Data.frame. Original analysis data. |
result_new |
Data.table. Sorted subgroup results. |
sg_focus |
Character. Sorting focus criterion. |
index.Z |
Matrix. Subgroup factor indicators. |
names.Z |
Character vector. Factor column names. |
details |
Logical. Print details. |
plot.sg |
Logical. Plot subgroup curves. |
by.risk |
Numeric. Risk interval for plotting. |
confs_labels |
Character vector. Human-readable labels. |
Value
List with results, subgroup definition, labels, flags, and group id.
Enhanced Subgroup Summary Tables (gt output)
Description
Returns formatted summary tables for subgroups using the gt package, with search metadata and customizable decimal precision. Produces two tables: a treatment effect estimates table and an identified subgroups table, each with fully customizable titles and subtitles.
Usage
sg_tables(
fs,
which_df = "est",
est_title = "Treatment Effect Estimates",
est_caption = "Training data estimates",
sg_title = "Identified Subgroups",
sg_subtitle = NULL,
potentialOutcome.name = NULL,
hr_1a = NA,
hr_0a = NA,
ndecimals = 3,
include_search_info = TRUE,
font_size = 12
)
Arguments
fs |
ForestSearch results object. |
which_df |
Character. Which data frame to use ("est" or "testing"). |
est_title |
Character or NULL. Main title for the estimates table
(default: "Treatment Effect Estimates"). Rendered as bold markdown.
Set to NULL to suppress the title and display only |
est_caption |
Character. Subtitle for the estimates table (default: "Training data estimates"). |
sg_title |
Character or NULL. Main title for the identified subgroups
table (default: "Identified Subgroups"). Rendered as bold markdown.
Set to NULL to suppress the title and display only |
sg_subtitle |
Character or NULL. Subtitle for the identified subgroups
table. When NULL (default), an informative subtitle is auto-generated
from |
potentialOutcome.name |
Character. Name of potential outcome variable (optional). |
hr_1a |
Character. Adjusted HR for subgroup 1 (optional). |
hr_0a |
Character. Adjusted HR for subgroup 0 (optional). |
ndecimals |
Integer. Number of decimals for formatted numbers (default: 3). |
include_search_info |
Logical. Include search metadata table (default: TRUE). |
font_size |
Numeric. Font size in pixels for table text (default: 12). |
Value
List with gt tables for estimates, subgroups, and optionally search info.
Simulate Survival Data from AFT Data Generating Mechanism
Description
Generates simulated survival data from a previously created AFT data generating mechanism (DGM). Samples from the super population and generates survival times with specified censoring.
Usage
simulate_from_dgm(
dgm,
n = NULL,
rand_ratio = 1,
entry_var = NULL,
max_entry = 24,
analysis_time = 48,
cens_adjust = 0,
draw_treatment = TRUE,
seed = NULL,
strata_rand = NULL,
hrz_crit = NULL,
keep_rand = FALSE,
time_eos = NULL
)
Arguments
dgm |
An object of class |
n |
Integer specifying the sample size. If |
rand_ratio |
Numeric randomisation ratio (treatment:control).
Default |
entry_var |
Character string naming an entry-time variable in the
super population. If |
max_entry |
Numeric maximum entry time for staggered entry simulation.
Only used when |
analysis_time |
Numeric calendar time of analysis. Follow-up is
|
cens_adjust |
Numeric log-scale adjustment to censoring distribution.
Positive values increase censoring times; negative values decrease them.
Default |
draw_treatment |
Logical. If |
seed |
Integer random seed. Default |
strata_rand |
Character string naming a column in the sampled data
for within-stratum balanced treatment allocation. If |
hrz_crit |
Numeric log-HR threshold. If supplied, a column
|
keep_rand |
Logical. If |
time_eos |
Numeric secondary administrative censoring cutoff
(end-of-study time on the DGM scale). Applied after |
Details
Time-scale consistency
All time parameters (analysis_time, max_entry,
time_eos) must be expressed in the same units as
outcome_var supplied to generate_aft_dgm_flex(). A common
error is building the DGM on days (e.g. rfstime) and then passing
analysis_time in months, which causes follow-up windows far shorter
than the DGM event-time scale and produces universal administrative
censoring (event_sim = 0 for all subjects).
Verify with: exp(dgm$model_params$mu) — the implied median event
time should be plausible given your analysis_time.
n = NULL path
When n = NULL the entire super population is used as-is, with no
staggered entry and no administrative censoring (follow_up = Inf).
Treatment assignments and linear predictors already stored in
dgm$df_super are retained unchanged.
Censoring adjustment
cens_adjust shifts the log-scale location parameter of the
censoring distribution:
-
cens_adjust = log(2)doubles expected censoring times. -
cens_adjust = log(0.5)halves expected censoring times.
Value
A data.frame with columns:
idSubject identifier.
treatOriginal treatment from super population.
treat_simSimulated treatment assignment.
flag_harmSubgroup indicator (1 = all subgroup conditions met).
z_*Covariate values.
lin_pred_1,lin_pred_0Counterfactual log-time linear predictors.
y_simObserved survival time (
min(T, C)).event_simEvent indicator (1 = event, 0 = censored).
t_trueLatent true survival time (pre-censoring).
c_timeEffective censoring time (post admin-censoring).
hrz_flag(Optional) Individual harm-zone indicator.
rand_order(Optional) Randomisation sequence index.
See Also
generate_aft_dgm_flex, check_censoring_dgm
Examples
dgm <- setup_gbsg_dgm(model = "null", verbose = FALSE)
sim_data <- simulate_from_dgm(dgm, n = 200, seed = 42)
dim(sim_data)
head(sim_data[, c("y_sim", "event_sim", "treat_sim")])
Simulate Trial Data from GBSG DGM
Description
Generates simulated clinical trial data from a GBSG-based data generating mechanism.
Usage
simulate_from_gbsg_dgm(
dgm,
n = NULL,
rand_ratio = 1,
sim_id = 1,
max_follow = Inf,
muC_adj = 0,
min_cens = NULL,
max_cens = NULL,
draw_treatment = TRUE
)
Arguments
dgm |
A "gbsg_dgm" object from |
n |
Integer. Sample size. If NULL, uses full super-population. Default: NULL |
rand_ratio |
Numeric. Randomization ratio (treatment:control). Default: 1 (1:1 randomization) |
sim_id |
Integer. Simulation ID used for seed offset. Default: 1 |
max_follow |
Numeric. Administrative censoring time (months). Default: Inf (no administrative censoring) |
muC_adj |
Numeric. Adjustment to censoring distribution location parameter. Positive values increase censoring. Default: 0 |
min_cens |
Numeric. Minimum censoring time for uniform censoring. Required if cens_type = "uniform" |
max_cens |
Numeric. Maximum censoring time for uniform censoring. Required if cens_type = "uniform" |
draw_treatment |
Logical. If TRUE, randomly assigns treatment. If FALSE, samples from existing treatment arms. Default: TRUE |
Value
Data frame with simulated trial data including:
- id
Subject identifier
- y.sim
Observed follow-up time
- event.sim
Event indicator (1 = event, 0 = censored)
- t.sim
True event time (before censoring)
- treat
Treatment indicator
- flag.harm
Harm subgroup indicator
- loghr_po
Individual log hazard ratio (potential outcome)
- v1-v7
Analysis factors
Sort Subgroups by Focus
Description
Sorts a data.table of subgroup results according to the specified focus.
Usage
sort_subgroups(result_new, sg_focus)
Arguments
result_new |
A data.table of subgroup results. |
sg_focus |
Sorting focus: "hr", "hrMaxSG", "maxSG", "hrMinSG", "minSG". |
Value
A sorted data.table.
Sort Subgroups by Focus at consistency stage (consistency not available at this point)
Description
Sorts a data.table of subgroup results according to the specified focus.
Usage
sort_subgroups_preview(result_new, sg_focus)
Arguments
result_new |
A data.table of subgroup results. |
sg_focus |
Sorting focus: "hr", "hrMaxSG", "maxSG", "hrMinSG", "minSG". |
Value
A sorted data.table.
Evaluate Subgroup Consistency
Description
Evaluates candidate subgroups using split-sample consistency validation. For each candidate, repeatedly splits the data and checks whether the treatment effect direction is consistent across splits.
Usage
subgroup.consistency(
df,
hr.subgroups,
hr.threshold = 1,
hr.consistency = 1,
pconsistency.threshold = 0.9,
m1.threshold = Inf,
n.splits = 100,
details = FALSE,
by.risk = 12,
plot.sg = FALSE,
maxk = 7,
Lsg,
confs_labels,
sg_focus = "hr",
stop_Kgroups = 10,
stop_threshold = NULL,
showten_subgroups = FALSE,
pconsistency.digits = 2,
seed = 8316951,
checking = FALSE,
use_twostage = FALSE,
twostage_args = list(),
parallel_args = list()
)
Arguments
df |
Data frame containing the analysis dataset. Must include columns for outcome (Y), event indicator (Event), and treatment (Treat). |
hr.subgroups |
Data.table of candidate subgroups from subgroup search, containing columns: HR, n, E, K, d0, d1, m0, m1, grp, and factor indicators. |
hr.threshold |
Numeric. Minimum hazard ratio threshold for candidates. Default: 1.0 |
hr.consistency |
Numeric. Minimum HR required in each split for consistency. Default: 1.0 |
pconsistency.threshold |
Numeric. Minimum proportion of splits that must be consistent. Default: 0.9 |
m1.threshold |
Numeric. Maximum m1 threshold for filtering. Default: Inf |
n.splits |
Integer. Number of splits for consistency evaluation. Default: 100 |
details |
Logical. Print progress details. Default: FALSE |
by.risk |
Numeric. Risk interval for KM plots. Default: 12 |
plot.sg |
Logical. Generate subgroup plots. Default: FALSE |
maxk |
Integer. Maximum number of factors in subgroup. Default: 7 |
Lsg |
List of subgroup parameters. |
confs_labels |
Character vector mapping factor names to labels. |
sg_focus |
Character. Subgroup selection criterion: "hr", "maxSG", or "minSG". Default: "hr" |
stop_Kgroups |
Integer. Maximum number of candidates to evaluate. Default: 10 |
stop_threshold |
Numeric in Note: Values > 1.0 are not permitted. To disable early
stopping, use Interaction with
For parallel execution, early stopping is checked after each batch
completes, so some additional candidates beyond the first meeting the
threshold may be evaluated. Use a smaller |
showten_subgroups |
Logical. If TRUE, prints up to 10 candidate subgroups after sorting by sg_focus, showing their rank, HR, sample size, events, and factor definitions. Useful for reviewing which candidates will be evaluated for consistency. Default: FALSE |
pconsistency.digits |
Integer. Decimal places for consistency proportion. Default: 2 |
seed |
Integer. Random seed for reproducible consistency splits. Default: 8316951. Set to NULL for non-reproducible random splits. The seed is used both for sequential execution (via set.seed()) and parallel execution (via future.seed). |
checking |
Logical. Enable additional validation checks. Default: FALSE |
use_twostage |
Logical. Use two-stage adaptive algorithm. Default: FALSE |
twostage_args |
List. Parameters for two-stage algorithm:
|
parallel_args |
List. Parallel processing configuration:
|
Value
A list containing:
- out_sg
Selected subgroup results
- sg_focus
Selection criterion used
- df_flag
Data frame with treatment recommendations
- sg.harm
Subgroup definition labels
- sg.harm.id
Subgroup membership indicator
- algorithm
"twostage" or "fixed"
- n_candidates_evaluated
Number of candidates actually evaluated
- n_candidates_total
Total candidates available
- n_passed
Number meeting consistency threshold
- early_stop_triggered
Logical indicating if early stop occurred
- early_stop_candidate
Index of candidate triggering early stop
- stop_threshold
Threshold used for early stopping
- seed
Random seed used for reproducibility (NULL if not set)
Subgroup Search for Treatment Effect Heterogeneity (Improved, Parallelized)
Description
Searches for subgroups with treatment effect heterogeneity using combinations of candidate factors. Evaluates subgroups for minimum prevalence, event counts, and hazard ratio threshold. Parallelizes the main search loop.
Usage
subgroup.search(
Y,
Event,
Treat,
ID = NULL,
Z,
n.min = 30,
d0.min = 15,
d1.min = 15,
hr.threshold = 1,
max.minutes = 30,
minp = 0.05,
rmin = 5,
details = FALSE,
maxk = 2,
parallel_workers = parallel::detectCores()
)
Arguments
Y |
Numeric vector of outcome (e.g., time-to-event). |
Event |
Numeric vector of event indicators (0/1). |
Treat |
Numeric vector of treatment group indicators (0/1). |
ID |
Optional vector of subject IDs. |
Z |
Matrix or data frame of candidate subgroup factors (binary indicators). |
n.min |
Integer. Minimum subgroup size. |
d0.min |
Integer. Minimum number of events in control. |
d1.min |
Integer. Minimum number of events in treatment. |
hr.threshold |
Numeric. Hazard ratio threshold for subgroup selection. |
max.minutes |
Numeric. Maximum minutes for search. |
minp |
Numeric. Minimum prevalence rate for each factor. |
rmin |
Integer. Minimum required reduction in sample size when adding a factor. |
details |
Logical. Print details during execution. |
maxk |
Integer. Maximum number of factors in a subgroup. |
parallel_workers |
Integer. Number of parallel workers (default: all available cores). |
Value
List with found subgroups, maximum HR, search time, configuration info, and filtering statistics.
Summarize Bootstrap Event Counts
Description
Provides summary statistics for event counts across bootstrap iterations, helping assess the reliability of HR estimates when events are sparse.
Usage
summarize_bootstrap_events(boot_results, threshold = 5)
Arguments
boot_results |
List. Output from forestsearch_bootstrap_dofuture() |
threshold |
Integer. Minimum event threshold for flagging low counts (default: 5) |
Details
This function summarizes event counts in four scenarios:
ORIGINAL subgroup H evaluated on BOOTSTRAP samples
ORIGINAL subgroup Hc evaluated on BOOTSTRAP samples
NEW subgroup H* (found in bootstrap) evaluated on ORIGINAL data
NEW subgroup Hc* (found in bootstrap) evaluated on ORIGINAL data
Low event counts (below threshold) can lead to unstable HR estimates. This summary helps identify potential issues with sparse events.
Value
Invisibly returns a list with summary statistics:
- threshold
The event threshold used
- nb_boots
Total number of bootstrap iterations
- n_successful
Number of iterations that found a new subgroup
- original_H
List with low event counts for original H on bootstrap samples
- original_Hc
List with low event counts for original Hc on bootstrap samples
- new_Hstar
List with low event counts for new H* on original data
- new_Hcstar
List with low event counts for new Hc* on original data
Enhanced Bootstrap Results Summary
Description
Creates comprehensive output including formatted table with subgroup footnote, diagnostic plots, bootstrap quality metrics, and detailed timing analysis.
Usage
summarize_bootstrap_results(
sgharm,
boot_results,
create_plots = FALSE,
est.scale = "hr"
)
Arguments
sgharm |
The selected subgroup object from forestsearch results. Can be:
|
boot_results |
List. Output from forestsearch_bootstrap_dofuture() |
create_plots |
Logical. Generate diagnostic plots (default: FALSE) |
est.scale |
Character. "hr" or "1/hr" for effect scale |
Details
The table output includes a footnote displaying the identified subgroup
definition, analogous to the tab_estimates table from sg_tables.
This is achieved by extracting the subgroup definition from sgharm and
passing it to format_bootstrap_table.
Value
List with components:
- table
gt table with treatment effects and subgroup footnote
- diagnostics
List of bootstrap quality metrics
- diagnostics_table_gt
gt table of diagnostics
- plots
List of ggplot2 diagnostic plots (if create_plots=TRUE)
- timing
List of timing analysis (if timing data available)
- subgroup_summary
List from summarize_bootstrap_subgroups()
See Also
format_bootstrap_table for table creation
sg_tables for analogous main analysis tables
summarize_bootstrap_subgroups for subgroup stability analysis
Summarize Bootstrap Subgroup Analysis Results
Description
Comprehensive summary of bootstrap subgroup identification results including basic statistics, factor frequencies, consistency distributions, and agreement with the original analysis subgroup.
Usage
summarize_bootstrap_subgroups(results, nb_boots, original_sg = NULL, maxk = 2)
Arguments
results |
Data.table or data.frame. Bootstrap results with subgroup characteristics including columns like Pcons, hr_sg, N_sg, K_sg, and M.1-M.k |
nb_boots |
Integer. Total number of bootstrap iterations |
original_sg |
Character vector. Original subgroup definition from main analysis (e.g., c("{age>=50}", "{nodes>=3}") for a 2-factor subgroup) |
maxk |
Integer. Maximum number of factors allowed in subgroup definition |
Value
List with summary components:
- basic_stats
Data.table of summary statistics
- consistency_dist
Data.table of Pcons distribution by bins
- size_dist
Data.table of subgroup size distribution
- factor_freq
Data.table of factor frequencies by position
- agreement
Data.table of subgroup definition agreement counts
- factor_presence
Data.table of base factor presence counts
- factor_presence_specific
Data.table of specific factor definitions
- original_agreement
Data.table comparing to original analysis subgroup
- n_found
Integer. Number of successful iterations
- pct_found
Numeric. Percentage of successful iterations
Summarize Factor Presence Across Bootstrap Subgroups
Description
Analyzes how often each individual factor appears in identified subgroups, extracting base factor names from full definitions and identifying common specific definitions.
Usage
summarize_factor_presence_robust(
results,
maxk = 2,
threshold = 10,
as_gt = TRUE
)
Arguments
results |
Data.table or data.frame. Bootstrap results with M.1, M.2, etc. columns |
maxk |
Integer. Maximum number of factors allowed |
threshold |
Numeric. Percentage threshold for including specific definitions (default: 10) |
as_gt |
Logical. Return gt tables (TRUE) or data.frames (FALSE) |
Value
List with base_factors and specific_factors data.frames or gt tables
Summarize Simulation Results
Description
Creates a summary table of operating characteristics across all simulations. Includes both HR and AHR metrics.
Usage
summarize_simulation_results(
results,
analyses = NULL,
digits = 2,
digits_hr = 3
)
Arguments
results |
data.table with simulation results from run_simulation_analysis |
analyses |
Character vector. Analysis methods to include. Default: all |
digits |
Integer. Decimal places for proportions. Default: 2 |
digits_hr |
Integer. Decimal places for hazard ratios. Default: 3 |
Value
Data frame with summary statistics
Summarize Single Analysis Results
Description
Summarize Single Analysis Results
Usage
summarize_single_analysis(result, digits = 2, digits_hr = 3)
Arguments
result |
data.table with results for a single analysis method |
digits |
Integer. Decimal places for proportions |
digits_hr |
Integer. Decimal places for hazard ratios |
Value
Data frame with summary statistics
Summary method for cox_ahr_cde objects
Description
Summary method for cox_ahr_cde objects
Usage
## S3 method for class 'cox_ahr_cde'
summary(object, ...)
Arguments
object |
A |
... |
Additional arguments (not used). |
Value
Invisibly returns the input object.
Summary Method for forestsearch Objects
Description
Provides a detailed summary of a ForestSearch analysis including input parameters, variable selection results, consistency evaluation, and the selected subgroup with key metrics.
Usage
## S3 method for class 'forestsearch'
summary(object, ...)
Arguments
object |
A |
... |
Additional arguments (currently unused). |
Value
Invisibly returns object.
Summary Tables for MRCT Simulation Results
Description
Creates summary tables from MRCT simulation results using the gt package. Summarizes hazard ratio estimates, subgroup identification rates, and classification of identified subgroups. Optionally displays two scenarios (e.g., alternative and null hypotheses) side by side.
Usage
summaryout_mrct(
pop_summary = NULL,
mrct_sims,
mrct_sims_null = NULL,
scenario_labels = c("Alternative", "Null"),
pop_summary_null = NULL,
sg_type = 1,
tab_caption = "Identified subgroups and estimation summaries",
digits = 3,
trim_threshold = 1000,
trim_fraction = 0.01,
table_width = 600,
font_size = 11,
showtable = TRUE
)
Arguments
pop_summary |
List. Population summary from large sample approximation (optional). Default: NULL |
mrct_sims |
data.table. Simulation results from
|
mrct_sims_null |
data.table. Optional second set of simulation results (e.g., null hypothesis). When supplied, the table displays two value columns side by side. Default: NULL (single-scenario table). |
scenario_labels |
Character vector of length 2. Column headers for the
two scenarios. Only used when |
pop_summary_null |
List. Population summary for the null scenario (optional). Default: NULL |
sg_type |
Integer. Type of subgroup summary: 1 = basic summary (found, biomarker, age); 2 = extended summary (all subgroup types). Default: 1 |
tab_caption |
Character. Caption for the output table. Default: "Identified subgroups and estimation summaries" |
digits |
Integer. Number of decimal places for numeric summaries. Default: 3 |
trim_threshold |
Numeric. When the raw mean of a metric exceeds this
value in absolute terms, the summary switches to a symmetrically trimmed
mean and SD (excluding the lower and upper |
trim_fraction |
Numeric between 0 and 0.5. Fraction of observations to trim from each tail when trimming is triggered. Default: 0.01 (1 percent from each tail, i.e., the central 98 percent of values). |
table_width |
Numeric. Total table width in pixels. Column widths are allocated proportionally. Increase for HTML/wide displays (e.g., 750), decrease for beamer slides (e.g., 550). Default: 600. |
font_size |
Numeric. Base font size in pixels. Title is
|
showtable |
Logical. Print the table. Default: TRUE |
Value
List with components:
- res
List of summary statistics from population. When dual-scenario, contains
res_altandres_null.- out_table
Formatted gt table object, or data.frame if gt is unavailable.
- data
Processed mrct_sims data.table with derived variables. When dual-scenario, also contains
data_null.- summary_df
Data frame of computed summary statistics.
See Also
mrct_region_sims for generating simulation results
Validate input data for GRF analysis
Description
Checks that input data meets requirements for GRF analysis
Usage
validate_grf_data(W, D, n.min)
Arguments
W |
Numeric vector. Treatment indicator |
D |
Numeric vector. Event indicator |
n.min |
Integer. Minimum subgroup size |
Value
Logical. TRUE if data is valid, FALSE with warning otherwise
Validate Input Parameters
Description
Validate Input Parameters
Usage
validate_inputs(
data,
model,
cens_type,
outcome_var,
event_var,
treatment_var,
continuous_vars,
factor_vars
)
Validate k_inter Effect on HR Heterogeneity
Description
Test function to verify that k_inter properly modulates the difference between HR(H) and HR(Hc), and that AHR metrics align with Cox-based HRs.
Usage
validate_k_inter_effect(
k_inter_values = c(-2, -1, 0, 1, 2, 3),
verbose = TRUE,
...
)
Arguments
k_inter_values |
Numeric vector of k_inter values to test. Default: c(-2, -1, 0, 1, 2, 3) |
verbose |
Logical. Print results. Default: TRUE |
... |
Additional arguments passed to create_gbsg_dgm |
Value
Data frame with k_inter, hr_H, hr_Hc, AHR_H, AHR_Hc, and ratio columns
Examples
# Test k_inter effect
results <- validate_k_inter_effect()
# k_inter = 0 should give hr_H approximately equals hr_Hc (ratio approximately 1)
Validate Dataset for MRCT Simulations
Description
Checks that a dataset contains all required variables for MRCT simulation functions and reports any issues. Required variables include outcome (tte, event), treatment (treat), continuous covariates (age, bm), and factor covariates (male, histology, prior_treat, regA).
Usage
validate_mrct_data(df.case, verbose = TRUE)
Arguments
df.case |
Data frame to validate |
verbose |
Logical. Print detailed validation results. Default: TRUE |
Details
Required Variables
The function checks for the following variables:
-
Outcome: tte (time-to-event), event (0/1 indicator)
-
Treatment: treat (0/1 indicator)
-
Continuous: age, bm (biomarker)
-
Factor: male (0/1), histology, prior_treat (0/1), regA (0/1)
The function also validates variable types and value ranges.
Value
Logical. TRUE if all requirements met, FALSE otherwise (invisibly)
See Also
create_dgm_for_mrct for creating DGM from validated data
Validate Spline Specification
Description
Validate Spline Specification
Usage
validate_spline_spec(spline_spec, df_work)
Wilson Score Confidence Interval
Description
Computes Wilson score confidence interval for a proportion, which has better coverage properties than the normal approximation for small samples and proportions near 0 or 1.
Usage
wilson_ci(x, n, conf.level = 0.95)
Arguments
x |
Integer. Number of successes. |
n |
Integer. Number of trials. |
conf.level |
Numeric. Confidence level (default 0.95). |
Value
Named numeric vector with elements: estimate, lower, upper.