Help for package forestsearch

Title:

Exploratory Subgroup Identification in Clinical Trials with Survival Endpoints

Version:

0.1.0

Description:

Implements statistical methods for exploratory subgroup identification in clinical trials with survival endpoints. Provides tools for identifying patient subgroups with differential treatment effects using machine learning approaches including Generalized Random Forests (GRF), LASSO regularization, and exhaustive combinatorial search algorithms. Features bootstrap bias correction using infinitesimal jackknife methods to address selection bias in post-hoc analyses. Designed for clinical researchers conducting exploratory subgroup analyses in randomized controlled trials, particularly for multi-regional clinical trials (MRCT) requiring regional consistency evaluation. Supports both accelerated failure time (AFT) and Cox proportional hazards models with comprehensive diagnostic and visualization tools. Methods are described in León et al. (2024) <doi:10.1002/sim.10163>.

License:

MIT + file LICENSE

Depends:

R (≥ 4.1.0)

Encoding:

UTF-8

RoxygenNote:

7.3.3

VignetteBuilder:

knitr

Imports:

data.table, doFuture, dplyr, foreach, future, future.apply, future.callr, ggplot2, glmnet, grf, gt, patchwork, policytree, progressr, randomForest, rlang, stringr, survival, weightedsurv

Suggests:

DiagrammeR, doRNG, htmltools, tidyr, forestploter, cubature, svglite, knitr, rmarkdown, katex

URL:

https://github.com/larry-leon/forestsearch, https://larry-leon.github.io/forestsearch/

BugReports:

https://github.com/larry-leon/forestsearch/issues

NeedsCompilation:

Packaged:

2026-03-19 04:27:29 UTC; larryleon

Author:

Larry Leon [aut, cre]

Maintainer:

Larry Leon <larry.leon.05@post.harvard.edu>

Repository:

CRAN

Date/Publication:

2026-03-23 17:20:14 UTC

forestsearch: Exploratory Subgroup Identification in Clinical Trials with Survival Endpoints

Description

Author(s)

Maintainer: Larry Leon larry.leon.05@post.harvard.edu

Cross-Validation Subgroup Match Summary

Description

Summarizes the match between cross-validation subgroups and analysis subgroups.

Usage

CV_sgs(sg1, sg2, confs, sg_analysis)

Arguments

sg1

Character vector. Subgroup 1 labels for each fold.

sg2

Character vector. Subgroup 2 labels for each fold.

confs

Character vector. Confounder names.

sg_analysis

Character vector. Subgroup analysis labels.

Value

List with indicators for any match, exact match, one match, and covariate-specific matches.

Convert Factor Code to Label

Description

Converts q-indexed codes to human-readable labels using the confs_labels mapping. Supports both full format ("q1.1", "q3.0") and short format ("q1", "q3"). Handles vector input via recursion.

Usage

FS_labels(Qsg, confs_labels)

Arguments

Qsg

Character. Factor code in format "q<index>.<action>" or "q<index>". For the full format, action 0 = NOT (complement), action 1 = IN (member). Short format defaults to action 1. Can also be a character vector for vectorized input.

confs_labels

Character vector. Labels for each factor, indexed by factor number.

Value

Character. Human-readable label wrapped in braces, e.g., "{age <= 50}" or "!{age <= 50}" for complement. Returns the original code if no match is found.

Subgroup summary table estimates

Description

Returns a summary table of subgroup estimates (HR, RMST, medians, etc.).

Usage

SG_tab_estimates(
  df,
  SG_flag,
  outcome.name = "tte",
  event.name = "event",
  treat.name = "treat",
  strata.name = NULL,
  hr_1a = NA,
  hr_0a = NA,
  potentialOutcome.name = NULL,
  sg1_name = NULL,
  sg0_name = NULL,
  draws = 0,
  details = FALSE,
  return_medians = TRUE,
  est.scale = "hr"
)

Arguments

df

Data frame.

SG_flag

Character. Subgroup flag variable.

outcome.name

Character. Name of outcome variable.

event.name

Character. Name of event indicator variable.

treat.name

Character. Name of treatment variable.

strata.name

Character. Name of strata variable (optional).

hr_1a

Character. Adjusted HR for subgroup 1 (optional).

hr_0a

Character. Adjusted HR for subgroup 0 (optional).

potentialOutcome.name

Character. Name of potential outcome variable (optional).

sg1_name

Character. Name for subgroup 1.

sg0_name

Character. Name for subgroup 0.

draws

Integer. Number of draws for resampling (optional).

details

Logical. Print details.

return_medians

Logical. Use medians or RMST.

est.scale

Character. Effect scale ("hr" or "1/hr").

Value

Data frame of subgroup summary estimates.

Violin/Boxplot Visualization of HR Estimates

Description

Creates violin plots with embedded boxplots showing the distribution of hazard ratio estimates across simulations for different analysis populations. Supports symmetric trimming to handle extreme values that can distort the display when small subgroups produce very large HR estimates.

Usage

SGplot_estimates(
  df,
  label_training = "Training",
  label_testing = "Testing",
  label_itt = "ITT (stratified)",
  label_sg = "Testing (subgroup)",
  trim_fraction = NULL,
  ylim = NULL,
  show_summary = NULL,
  title = "Distribution of HR Estimates Across Simulations",
  subtitle = NULL
)

Arguments

df

data.frame or data.table. Simulation results from mrct_region_sims

label_training

Character. Label for training data estimates. Default: "Training"

label_testing

Character. Label for testing data estimates. Default: "Testing"

label_itt

Character. Label for ITT estimates. Default: "ITT (stratified)"

label_sg

Character. Label for subgroup estimates. Default: "Testing (subgroup)"

trim_fraction

Numeric or NULL. Fraction of observations to trim from each tail (e.g., 0.01 trims the lowest 1\ When non-NULL, trimmed means and SDs are computed for each group, extreme observations are flagged, and the y-axis is clipped to the trimmed data range. Set to NULL (default) for no trimming (backward compatible).

ylim

Numeric vector of length 2 or NULL. Explicit y-axis limits as c(lower, upper). Overrides automatic limits from trimming. Default: NULL (auto).

show_summary

Logical. Annotate each violin with mean (SD) below the x-axis labels. When trimming is active, displays trimmed statistics. Default: TRUE when trim_fraction is non-NULL, FALSE otherwise.

title

Character. Plot title. Default: "Distribution of HR Estimates Across Simulations".

subtitle

Character or NULL. Plot subtitle. When trimming is active and subtitle is NULL, an auto-generated note indicating the trim fraction and number of flagged observations is shown. Default: NULL.

Value

List with components:

dfPlot_estimates: data.table formatted for plotting, with a trimmed logical column when trimming is active
plot_estimates: ggplot2 object
trim_info: List of per-group trimming diagnostics (NULL when no trimming). Each element contains: n_total, n_trimmed, n_flagged, raw_mean, raw_sd, trimmed_mean, trimmed_sd, lower_bound, upper_bound.

Disjunctive (dummy) coding for factor columns

Description

Disjunctive (dummy) coding for factor columns

Usage

acm.disjctif(df)

Arguments

df

Data frame with factor variables.

Value

Data frame with dummy-coded columns.

Add ID Column to Data Frame

Description

Ensures that a data frame has a unique ID column. If id.name is not provided, a column named "id" is added. If id.name is provided but does not exist in the data frame, it is created with unique integer values.

Usage

add_id_column(df.analysis, id.name = NULL)

Arguments

df.analysis

Data frame to which the ID column will be added.

id.name

Character. Name of the ID column to add (default is NULL, which uses "id").

Value

Data frame with the ID column added if necessary.

Add Unprocessed Variables from Original Data

Description

Add Unprocessed Variables from Original Data

Usage

add_unprocessed_vars(
  df_work,
  data,
  outcome_var,
  event_var,
  treatment_var,
  continuous_vars,
  factor_vars,
  verbose
)

Analyze subgroup for summary table (OPTIMIZED)

Description

Analyzes a subgroup and returns formatted results for summary table. Uses optimized cox_summary() and reduces redundant calculations.

Usage

analyze_subgroup(
  df_sub,
  outcome.name,
  event.name,
  treat.name,
  strata.name,
  subgroup_name,
  hr_a,
  potentialOutcome.name,
  return_medians,
  N
)

Arguments

df_sub

Data frame for subgroup.

outcome.name

Character. Name of outcome variable.

event.name

Character. Name of event indicator variable.

treat.name

Character. Name of treatment variable.

strata.name

Character. Name of strata variable (optional).

subgroup_name

Character. Subgroup name.

hr_a

Character. Adjusted hazard ratio (optional).

potentialOutcome.name

Character. Name of potential outcome variable (optional).

return_medians

Logical. Use medians or RMST.

N

Creates treatment recommendation flags based on identified subgroup

Usage

assign_subgroup_membership(data, best_subgroup, trees, X)

Arguments

data

Data frame. Original data

best_subgroup

Data frame row. Selected subgroup information

trees

List. Policy trees

X

Matrix. Covariate matrix

Value

Data frame with added predict.node and treat.recommend columns

Bootstrap Results for ForestSearch with Bias Correction

Description

Runs bootstrap analysis for ForestSearch, fitting Cox models and computing bias-corrected estimates and valid CIs (see vignette for references)

Usage

bootstrap_results(
  fs.est,
  df_boot_analysis,
  cox.formula.boot,
  nb_boots,
  show_three,
  H_obs,
  Hc_obs,
  seed = 8316951L
)

Arguments

fs.est

List. ForestSearch results object from forestsearch. Must contain:

df.est: Data frame with analysis data including treat.recommend
confounders.candidate: Character vector of confounder names
args_call_all: List of original forestsearch call arguments

df_boot_analysis

Data frame. Bootstrap analysis data with same structure as fs.est$df.est. Must contain columns for outcome, event, treatment, and the treat.recommend flag.

cox.formula.boot

Formula. Cox model formula for bootstrap, typically created by build_cox_formula. Should be of form Surv(outcome, event) ~ treatment.

nb_boots

Integer. Number of bootstrap samples to generate (e.g., 500-1000). More iterations provide better bias correction but increase computation time.

show_three

Logical. If TRUE, prints detailed progress for the first three bootstrap iterations for debugging purposes. Default: FALSE.

H_obs

Numeric. Observed log hazard ratio for subgroup H (harm/questionable group, treat.recommend == 0) from original sample. Used as reference for bias correction.

Hc_obs

Numeric. Observed log hazard ratio for subgroup H^c (complement/recommend, treat.recommend == 1) from original sample. Used as reference for bias correction.

seed

Integer. Random seed for reproducibility. Default 8316951L. Must match the seed used in bootstrap_ystar to ensure bootstrap index alignment.

Value

Data.table with one row per bootstrap iteration and columns:

boot_id: Integer. Bootstrap iteration number (1 to nb_boots)
H_biasadj_1: Bias-corrected estimate for H using method 1: H_obs - (Hstar_star - Hstar_obs)
H_biasadj_2: Bias-corrected estimate for H using method 2: 2*H_obs - (H_star + Hstar_star - Hstar_obs)
Hc_biasadj_1: Bias-corrected estimate for H^c using method 1
Hc_biasadj_2: Bias-corrected estimate for H^c using method 2
max_sg_est: Numeric. Maximum subgroup hazard ratio found
L: Integer. Number of candidate factors evaluated
max_count: Integer. Maximum number of factor combinations
events_H_0: Integer. Number of events in control arm of original subgroup H on bootstrap sample
events_H_1: Integer. Number of events in treatment arm of original subgroup H on bootstrap sample
events_Hc_0: Integer. Number of events in control arm of original subgroup H^c on bootstrap sample
events_Hc_1: Integer. Number of events in treatment arm of original subgroup H^c on bootstrap sample
events_Hstar_0: Integer. Number of events in control arm of new subgroup H* on original data
events_Hstar_1: Integer. Number of events in treatment arm of new subgroup H* on original data
events_Hcstar_0: Integer. Number of events in control arm of new subgroup H^c* on original data
events_Hcstar_1: Integer. Number of events in treatment arm of new subgroup H^c* on original data
tmins_search: Numeric. Minutes spent on subgroup search in this iteration
tmins_iteration: Numeric. Total minutes for this bootstrap iteration
Pcons: Numeric. Consistency p-value for top subgroup
hr_sg: Numeric. Hazard ratio for top subgroup
N_sg: Integer. Sample size of top subgroup
E_sg: Integer. Number of events in top subgroup
K_sg: Integer. Number of factors defining top subgroup
g_sg: Numeric. Subgroup group ID
m_sg: Numeric. Subgroup index
M.1: Character. First factor label
M.2: Character. Second factor label
M.3: Character. Third factor label
M.4: Character. Fourth factor label
M.5: Character. Fifth factor label
M.6: Character. Sixth factor label
M.7: Character. Seventh factor label

Rows where no valid subgroup was found will have NA for bias corrections. The returned object has a "timing" attribute with summary statistics.

Bias Correction Methods

Two bias correction approaches are implemented:

Method 1 (Simple Optimism):

H_{adj1} = H_{obs} - (H^*_{*} - H^*_{obs})

where H^*_{*} is the new subgroup HR on bootstrap data and H^*_{obs} is the new subgroup HR on original data.
Method 2 (Double Bootstrap):

H_{adj2} = 2 \times H_{obs} - (H_{*} + H^*_{*} - H^*_{obs})

where H_{*} is the original subgroup HR on bootstrap data.

where:

H_obs: Original subgroup HR on original data
H_star: Original subgroup HR on bootstrap data
Hstar_obs: New subgroup (found in bootstrap) HR on original data
Hstar_star: New subgroup (found in bootstrap) HR on bootstrap data

Computational Details

Uses doFuture backend for parallel execution (configured externally)
Sets reproducible seeds: 8316951 + boot * 100 for each iteration
Each bootstrap iteration runs full ForestSearch pipeline including variable selection, subgroup search, and consistency evaluation
Sequential execution within each bootstrap prevents nested parallelization
Failed bootstrap iterations generate warnings but don't stop execution
Confounders are removed from bootstrap data to force fresh variable selection

Bootstrap Configuration

Each bootstrap iteration modifies ForestSearch arguments to:

Suppress output: details, showten_subgroups, plot.sg, plot.grf all set to FALSE
Force re-selection: grf_res and grf_cuts set to NULL
Prevent nested parallel: parallel_args$plan = "sequential", workers = 1

Performance Considerations

Typical runtime: 1-5 seconds per bootstrap iteration
For 1000 bootstraps with 6 workers: ~3-10 minutes total
Memory usage scales with dataset size and number of workers
Consider reducing nb_boots for initial testing (e.g., 100)

Error Handling

The function gracefully handles three failure modes:

Bootstrap sample creation fails: Returns row with all NA
ForestSearch fails to run: Warns and returns row with all NA
ForestSearch runs but finds no subgroup: Returns row with all NA

All three cases ensure the foreach loop can still combine results via rbind.

Note

This function is designed to be called within a foreach loop with %dofuture% operator. It requires:

All functions in get_bootstrap_exports to be available in the parallel workers
Packages listed in BOOTSTRAP_REQUIRED_PACKAGES to be installed
Proper parallel backend setup via setup_parallel_SGcons

Bootstrap Ystar Matrix

Description

Generates a bootstrap matrix for Ystar using parallel processing.

Usage

bootstrap_ystar(df, nb_boots, seed = 8316951L)

Arguments

df

Data frame.

nb_boots

Integer. Number of bootstrap samples.

seed

Integer. Random seed for reproducibility. Default 8316951L. Must match the seed used in bootstrap_results to ensure bootstrap index alignment with the Ystar matrix.

Value

Matrix of bootstrap samples (nb_boots x nrow(df)).

Build Classification Rate Table from Simulation Results

Description

Constructs a publication-quality gt table summarizing subgroup identification and classification rates across one or more data generation scenarios and analysis methods. The layout mirrors Table 4 of Leon et al. (2024) with metrics grouped by model scenario (null / alt) and columns for each analysis method.

Usage

build_classification_table(
  scenario_results,
  analyses = NULL,
  digits = 2,
  title = "Subgroup Identification and Classification Rates",
  n_sims = NULL,
  bold_threshold = 0.05,
  font_size = 12
)

Arguments

scenario_results

Named list. Each element is itself a list with:

results: data.table from run_simulation_analysis.
label: Character scenario label, e.g., "M1".
n_sample: Integer sample size.
dgm: DGM object (for true HRs and subgroup prevalence).
hypothesis: Character: "null" or "alt".

analyses

Character vector of analysis labels to include (e.g., c("FS", "FSlg", "GRF")). When NULL, all unique values of results$analysis across scenarios are used.

digits

Integer. Decimal places for proportions. Default: 2.

title

Character. Table title. Default: "Subgroup Identification and Classification Rates".

n_sims

Integer. Number of simulations (for subtitle). Default: NULL.

bold_threshold

Numeric. Type I error threshold above which the any(H) value is shown in bold. Set NULL to disable. Default: 0.05.

font_size

Numeric. Font size in pixels for table text. Default: 12. Increase to 14 or 16 for larger display.

Details

For each scenario the function computes:

any(H): Proportion of simulations identifying any subgroup.
sens(H): Mean sensitivity (only under alternative).
sens(Hc): Mean specificity.
ppv(H): Mean positive predictive value (only under alternative).
ppv(Hc): Mean negative predictive value.
avg|H|: Mean size of identified subgroup (when found).

Under the null hypothesis the rows are reduced to any(H), sens(Hc), ppv(Hc), and avg|H|.

Value

A gt table object.

Build Cox Model Formula

Description

Constructs a Cox model formula from variable names.

Usage

build_cox_formula(outcome.name, event.name, treat.name)

Arguments

outcome.name

Character. Name of outcome variable.

event.name

Character. Name of event indicator variable.

treat.name

Character. Name of treatment variable.

Value

An R formula object for Cox regression.

Build Estimation Properties Table from Simulation Results

Description

Constructs a publication-quality gt table summarizing estimation properties for hazard ratios in the identified subgroup and its complement. The layout mirrors Table 5 of Leon et al. (2024), showing average estimate, empirical SD, min, max, and relative bias for each estimator.

Usage

build_estimation_table(
  results,
  dgm,
  analysis_method = "FSlg",
  n_boots = NULL,
  digits = 2,
  title = "Estimation Properties",
  subtitle = NULL,
  font_size = 12,
  cde_H = NULL,
  cde_Hc = NULL
)

Arguments

results

data.table or data.frame. Simulation results from run_simulation_analysis, optionally enriched with bootstrap bias-corrected columns (hr.H.bc, hr.Hc.bc).

dgm

DGM object. Used for true parameter values (hr_H_true, hr_Hc_true, and AHR truth via get_dgm_hr).

analysis_method

Character. Which analysis method to tabulate (e.g., "FSlg"). Default: "FSlg".

n_boots

Integer or NULL. Number of bootstraps. When non-NULL, appended to the subtitle as "(B = n_boots bootstraps)". Default: NULL.

digits

Integer. Decimal places. Default: 2.

title

Character. Table title.

subtitle

Character or NULL. Optional user-supplied subtitle text. Auto-populated statistics (method, estimable count, proportion) are always shown. When non-NULL, the custom text is displayed first with stats appended in parentheses, e.g. "My title (FSlg: 18/20 (90%) estimable)".

font_size

Numeric. Font size in pixels for table text. Default: 12. Increase to 14 or 16 for larger display.

cde_H

Numeric or NULL. Controlled direct effect (theta-ddagger(H)) for the true harm subgroup. When non-NULL, an additional b-ddagger bias column is shown. When NULL (default), auto-detected from dgm$cde_H or dgm$hazard_ratios$CDE_harm.

cde_Hc

Numeric or NULL. Controlled direct effect for the complement. Auto-detected analogously.

Details

Uses the paper's notation conventions:

theta-dagger: Marginal (causal) HR truth
theta-ddagger: Controlled direct effect (CDE) truth
theta-hat(H-hat): Plugin Cox estimate in identified subgroup
theta-hat*(H-hat): Bootstrap bias-corrected estimate

Includes both Cox-based HR and AHR (Average Hazard Ratio from loghr_po) estimators when AHR columns are present in the results.

For each subgroup (H and Hc) the function reports:

Avg: Mean of the estimates across estimable simulations.
SD: Empirical standard deviation.
Min / Max: Range.
b-dagger: Relative bias (percent) vs marginal truth, 100 * (Avg - theta_dagger) / theta_dagger.
b-ddagger (conditional): Relative bias (percent) vs CDE truth, shown when CDE values are available.

When bootstrap-corrected columns (hr.H.bc, hr.Hc.bc) are present in results, an additional bias-corrected row (theta-hat*(H-hat)) is added per subgroup.

When AHR columns (ahr.H.hat, ahr.Hc.hat) are present, AHR estimation rows are appended using the DGM's true AHR values for relative bias calculation.

When CDE columns (cde.H.hat, cde.Hc.hat) are present and CDE truth values are available, CDE estimation rows (theta-ddagger(H-hat)) are appended. The b-dagger column for CDE rows reports bias relative to the CDE truth rather than the marginal HR.

Value

A gt table object, or NULL if no estimable realizations exist.

Calculate Covariance for Bootstrap Estimates

Description

Calculates the covariance between a vector and bootstrap estimates.

Usage

calc_cov(x, Est)

Arguments

x

Numeric vector.

Est

Numeric vector of bootstrap estimates.

Value

Numeric value of covariance.

Calculate counts for subgroup summary

Description

Calculates sample size, treated count, and event count for a subgroup.

Usage

calculate_counts(Y, E, Treat, N)

Arguments

Y

Numeric vector of outcome.

E

Numeric vector of event indicators.

Treat

Numeric vector of treatment indicators.

N

Integer. Total sample size.

Value

List with formatted counts.

Calculate Event Counts by Treatment Arm

Description

Calculate Event Counts by Treatment Arm

Usage

calculate_event_counts(dd, tt, id.x)

Calculate Hazard Ratios from Potential Outcomes

Description

Calculate Hazard Ratios from Potential Outcomes

Usage

calculate_hazard_ratios(df_super, n_super, mu, tau, model, verbose)

Arguments

df_super

Data frame with super population

n_super

Size of super population

mu

Intercept parameter

tau

Scale parameter

model

Model type ("alt" or "null")

verbose

Calculates the average hazard ratio from a potential outcome variable.

Usage

calculate_potential_hr(df, potentialOutcome.name)

Arguments

df

Data frame.

potentialOutcome.name

Character. Name of potential outcome variable.

Value

Numeric value of average hazard ratio.

Calculate Skewness

Description

Helper function to calculate sample skewness.

Usage

calculate_skewness(x)

Arguments

x

Numeric vector

Value

Numeric skewness value

Calibrate Censoring Adjustment to Match DGM Reference Distribution

Description

Uses root-finding to select a value of cens_adjust for simulate_from_dgm such that a chosen censoring summary statistic in the simulated data matches the corresponding statistic from the DGM reference data (dgm$df_super).

Usage

calibrate_cens_adjust(
  dgm,
  target = c("rate", "km_median"),
  n = 1000,
  rand_ratio = 1,
  analysis_time = 48,
  max_entry = 24,
  seed = 42,
  interval = c(-3, 3),
  tol = 1e-04,
  n_eval = 2000,
  verbose = TRUE,
  ...
)

Arguments

dgm

An "aft_dgm_flex" object from generate_aft_dgm_flex.

target

Character. Calibration target: "rate" (default) or "km_median".

n

Integer. Sample size passed to simulate_from_dgm. Default 1000.

rand_ratio

Numeric. Randomisation ratio passed to simulate_from_dgm. Default 1.

analysis_time

Numeric. Calendar analysis time passed to simulate_from_dgm. Must be on the DGM time scale. Default 48.

max_entry

Numeric. Maximum staggered entry time passed to simulate_from_dgm. Default 24.

seed

Integer. Base random seed. Each evaluation of the objective function uses this seed for reproducibility. Default 42.

interval

Numeric vector of length 2. Search interval for cens_adjust on the log scale. Default c(-3, 3) (corresponding roughly to a 20-fold decrease/increase in censoring times).

tol

Numeric. Root-finding tolerance. Default 1e-4.

n_eval

Integer. Sample size used inside the objective function during root-finding. Smaller values are faster but noisier; increase for precision. Default 2000.

verbose

Logical. Print search progress and final result. Default TRUE.

...

Additional arguments passed to simulate_from_dgm (e.g. strata_rand, time_eos).

Details

Two calibration targets are supported:

"rate": Overall censoring rate (proportion censored). Finds cens_adjust such that mean(event_sim == 0) in simulated data equals mean(event == 0) in dgm$df_super.
"km_median": KM-based median censoring time, estimated by reversing the event indicator so censored observations become the "event" of interest. Finds cens_adjust such that the simulated KM median matches the reference KM median.

How the objective function works

At each candidate cens_adjust value, the objective function:

Calls simulate_from_dgm() with n = n_eval and the candidate cens_adjust.
Calls check_censoring_dgm() with verbose = FALSE to extract the target metric.
Returns sim_metric - ref_metric.

uniroot finds the zero crossing, i.e. the cens_adjust at which simulated and reference metrics are equal.

Monotonicity

The objective is monotone in cens_adjust for both targets:

Larger cens_adjust → longer censoring times → lower censoring rate and higher KM median.
Smaller cens_adjust → shorter censoring times → higher censoring rate and lower KM median.

If uniroot fails (the target lies outside the search interval), the boundary values are printed and a wider interval should be tried.

Stochastic noise

Because the objective function involves simulation, there is Monte Carlo noise. Setting a fixed seed and a sufficiently large n_eval (>= 2000) reduces noise enough for reliable root-finding. The tol argument controls the root-finding tolerance on the cens_adjust scale (not the metric scale).

Value

A named list with elements:

cens_adjust: Calibrated cens_adjust value.
target: Calibration target used.
ref_value: Reference metric value from dgm$df_super.
sim_value: Achieved metric value in simulated data at the calibrated cens_adjust.
residual: Absolute difference between sim_value and ref_value.
iterations: Number of uniroot iterations.
diagnostic: Output of check_censoring_dgm at the calibrated value (invisibly).

Examples


library(survival)

# Build DGM on months scale
gbsg$time_months <- gbsg$rfstime / 30.4375

dgm <- generate_aft_dgm_flex(
  data            = gbsg,
  continuous_vars = c("age", "size", "nodes", "pgr", "er"),
  factor_vars     = c("meno", "grade"),
  outcome_var     = "time_months",
  event_var       = "status",
  treatment_var   = "hormon",
  subgroup_vars   = c("er", "meno"),
  subgroup_cuts   = list(er = 20, meno = 0)
)

# Calibrate so simulated censoring rate matches reference
cal_rate <- calibrate_cens_adjust(
  dgm           = dgm,
  target        = "rate",
  n             = 1000,
  analysis_time = 84,
  max_entry     = 24
)
cat("Calibrated cens_adjust (rate):", cal_rate$cens_adjust, "\n")

# Calibrate to KM median censoring time instead
cal_km <- calibrate_cens_adjust(
  dgm           = dgm,
  target        = "km_median",
  n             = 1000,
  analysis_time = 84,
  max_entry     = 24
)
cat("Calibrated cens_adjust (km_median):", cal_km$cens_adjust, "\n")

# Use calibrated value in simulation
sim <- simulate_from_dgm(
  dgm           = dgm,
  n             = 1000,
  analysis_time = 84,
  max_entry     = 24,
  cens_adjust   = cal_rate$cens_adjust,
  seed          = 123
)
mean(sim$event_sim)   # event rate
mean(sim$event_sim == 0)  # censoring rate — should match ref

Calibrate k_inter for Target Subgroup Hazard Ratio

Description

Finds the interaction effect multiplier (k_inter) that achieves a target hazard ratio in the harm subgroup.

Usage

calibrate_k_inter(
  target_hr_harm,
  model = "alt",
  k_treat = 1,
  cens_type = "weibull",
  k_inter_range = c(-100, 100),
  tol = 1e-06,
  use_ahr = FALSE,
  verbose = FALSE,
  ...
)

Arguments

target_hr_harm

Numeric. Target hazard ratio for the harm subgroup

model

Character. Model type ("alt" only). Default: "alt"

k_treat

Numeric. Treatment effect multiplier. Default: 1

cens_type

Character. Censoring type. Default: "weibull"

k_inter_range

Numeric vector of length 2. Search range for k_inter. Default: c(-100, 100)

tol

Numeric. Tolerance for root finding. Default: 1e-6

use_ahr

Logical. If TRUE, calibrate to AHR instead of Cox-based HR. Default: FALSE

verbose

Logical. Print diagnostic information. Default: FALSE

...

Additional arguments passed to create_gbsg_dgm

Details

This function uses uniroot to find the k_inter value such that the empirical HR (or AHR) in the harm subgroup equals target_hr_harm.

Value

Numeric value of k_inter that achieves the target HR

Examples


# Find k_inter for HR = 1.5 in harm subgroup
k <- calibrate_k_inter(target_hr_harm = 1.5, verbose = TRUE)

# Verify
dgm <- setup_gbsg_dgm(model = "alt", k_inter = k, verbose = FALSE)
print(dgm)

# Calibrate to AHR instead
k_ahr <- calibrate_k_inter(target_hr_harm = 1.5, use_ahr = TRUE, verbose = TRUE)
dgm_ahr <- setup_gbsg_dgm(model = "alt", k_inter = k_ahr, verbose = FALSE)
print(dgm_ahr)

Diagnose Censoring Consistency Between DGM Source Data and Simulated Data

Description

Compares the censoring distribution observed in the data used to build the DGM against the censoring generated by simulate_from_dgm. Reports censoring rates, time quantiles, KM-based median censoring times, and flags substantial discrepancies.

Usage

check_censoring_dgm(
  sim_data,
  dgm,
  treat_var = "treat_sim",
  rate_tol = 0.1,
  median_tol = 0.25,
  verbose = TRUE
)

Arguments

sim_data

A data.frame returned by simulate_from_dgm.

dgm

An "aft_dgm_flex" object from generate_aft_dgm_flex. The super population (dgm$df_super) provides reference censoring times and event indicators on the DGM time scale.

treat_var

Character. Name of the treatment column in sim_data used for arm-stratified comparisons. Default "treat_sim".

rate_tol

Numeric. Absolute tolerance (proportion scale) for flagging a censoring-rate discrepancy. Default 0.10 (10 pp).

median_tol

Numeric. Relative tolerance for flagging a KM median censoring-time discrepancy. Default 0.25 (25 percent).

verbose

Logical. If TRUE, prints the full diagnostic table. Default TRUE.

Details

The reference censoring distribution is derived from dgm$df_super, sampled with replacement from the data passed to generate_aft_dgm_flex(). Columns y (observed time) and event (event indicator) in df_super reflect the original observed censoring process on the DGM time scale.

The KM median censoring time is estimated by reversing the event indicator (1 - event), treating events as censored and censored observations as the event of interest. This gives a non-parametric estimate of the censoring time distribution unconfounded by event occurrence.

Common causes of discrepancy: (1) time-scale mismatch (DGM built on days, analysis_time in months); check exp(dgm$model_params$mu) against your analysis_time. (2) Large cens_adjust shifting censoring substantially from the fitted model. (3) Short analysis_time or time_eos making administrative censoring dominate the censoring process.

Value

Invisibly returns a named list. Elements are: rates (data frame of censoring rates overall and by arm); quantiles (data frame of censoring-time quantiles among censored subjects); km_medians (data frame of KM-based median censoring times); and flags (character vector of triggered warnings, empty if none).

Examples


dgm <- setup_gbsg_dgm(model = "null", verbose = FALSE)
sim_data <- simulate_from_dgm(dgm, n = 200)
check_censoring_dgm(sim_data, dgm = dgm)

Confidence Interval for Estimate

Description

Calculates confidence interval for an estimate, optionally on log(HR) scale.

Usage

ci_est(x, sd, alpha = 0.025, scale = "hr", est.loghr = TRUE)

Arguments

x

Numeric estimate.

sd

Numeric standard deviation.

alpha

Numeric significance level (default: 0.025).

scale

Character. "hr" or "1/hr".

est.loghr

Logical. Is estimate on log(HR) scale?

Value

List with length, lower, upper, sd, and estimate.

Compare Detection Curves Across Sample Sizes

Description

Generates and compares detection probability curves for multiple subgroup sample sizes.

Usage

compare_detection_curves(
  n_sg_values,
  prop_cens = 0.3,
  hr_threshold = 1.25,
  hr_consistency = 1,
  theta_range = c(0.5, 3),
  n_points = 40L,
  verbose = TRUE
)

Arguments

n_sg_values

Integer vector. Subgroup sample sizes to compare.

prop_cens

Numeric. Proportion censored. Default: 0.3

hr_threshold

Numeric. HR threshold. Default: 1.25

hr_consistency

Numeric. HR consistency threshold. Default: 1.0

theta_range

Numeric vector of length 2. Range of HR values. Default: c(0.5, 3.0)

n_points

Integer. Number of points per curve. Default: 40

verbose

Logical. Print progress. Default: TRUE

Value

A data.frame with all curves combined, including n_sg as a factor.

Compare Multiple Survival Regression Models

Description

Performs comprehensive comparison of multiple survreg models including convergence checking, information criteria comparison, and model selection.

Usage

compare_multiple_survreg(
  ...,
  model_names = NULL,
  verbose = TRUE,
  criteria = c("AIC", "BIC")
)

Arguments

...

survreg model objects to compare

model_names

Optional character vector of model names

verbose

Logical, whether to print detailed output (default: TRUE)

criteria

Character vector of criteria to use ("AIC", "BIC", or both)

Value

A list of class "multi_survreg_comparison" containing:

models: Named list of input models
convergence: Convergence status for each model
comparison: Model comparison statistics
rankings: Model rankings by different criteria
best_model: Name of the best model
recommendation: Text recommendation

Compute AHR from loghr_po

Description

Computes Average Hazard Ratio from individual log hazard ratios.

Usage

compute_ahr(df, subset_indicator = NULL)

Arguments

df

Data frame with loghr_po column

subset_indicator

Optional logical/integer vector for subsetting

Value

Numeric AHR value

Compute CDE from theta_0 and theta_1

Description

Computes Controlled Direct Effect as the ratio of average hazard contributions on the natural scale: CDE(S) = mean(exp(theta_1[S])) / mean(exp(theta_0[S])).

Usage

compute_cde(df, subset_indicator = NULL)

Arguments

df

Data frame with theta_0 and theta_1 columns.

subset_indicator

Optional logical/integer vector for subsetting. If provided, only rows where subset_indicator == 1 are used.

Value

Numeric CDE value, or NA_real_ if columns are missing.

Compute Probability of Detecting True Subgroup

Description

Calculates the probability that a true subgroup with given hazard ratio will be detected using the ForestSearch consistency-based criteria.

Usage

compute_detection_probability(
  theta,
  n_sg,
  prop_cens = 0.3,
  hr_threshold = 1.25,
  hr_consistency = 1,
  method = c("cubature", "monte_carlo"),
  n_mc = 100000L,
  tol = 1e-04,
  verbose = FALSE
)

Arguments

theta

Numeric. True hazard ratio in the subgroup. Can be a vector for computing detection probability across multiple HR values.

n_sg

Integer. Subgroup sample size.

prop_cens

Numeric. Proportion censored (0-1). Default: 0.3

hr_threshold

Numeric. HR threshold for detection (e.g., 1.25). This is the threshold that the average HR across splits must exceed.

hr_consistency

Numeric. HR consistency threshold (e.g., 1.0). This is the threshold each individual split must exceed. Default: 1.0

method

Character. Integration method: "cubature" (recommended for accuracy) or "monte_carlo" (faster for exploration). Default: "cubature"

n_mc

Integer. Number of Monte Carlo samples if method = "monte_carlo". Default: 100000

tol

Numeric. Relative tolerance for cubature integration. Default: 1e-4

verbose

Logical. Print progress for vector inputs. Default: FALSE

Details

This function computes P(detect | theta) using the asymptotic normal approximation for the log hazard ratio estimator. The detection criterion is based on ForestSearch's split-sample consistency evaluation:

The subgroup HR estimate must exceed hr_threshold on average
Each split-half must individually exceed hr_consistency

The approximation assumes:

Large sample sizes (CLT applies)
Var(log(HR)) ~ 4/d per treatment arm
Independence between split-halves (conditional on true effect)

Value

If theta is scalar, returns a single probability. If theta is a vector, returns a data.frame with columns: theta, probability.

Examples


# Single HR value
prob <- compute_detection_probability(
  theta = 1.5,
  n_sg = 60,
  prop_cens = 0.2,
  hr_threshold = 1.25
)

# Vector of HR values for power curve
hr_values <- seq(1.0, 2.5, by = 0.1)
results <- compute_detection_probability(
  theta = hr_values,
  n_sg = 60,
  prop_cens = 0.2,
  hr_threshold = 1.25,
  verbose = TRUE
)

# Plot detection probability curve
plot(results$theta, results$probability, type = "l",
     xlab = "True HR", ylab = "P(detect)")

Compute Detection Probability for Single Theta (Internal)

Description

Compute Detection Probability for Single Theta (Internal)

Usage

compute_detection_probability_single(
  theta,
  n_sg,
  prop_cens,
  k_avg,
  k_ind,
  method,
  n_mc,
  tol
)

Arguments

theta

Numeric. True hazard ratio in the subgroup. Can be a vector for computing detection probability across multiple HR values.

n_sg

Integer. Subgroup sample size.

prop_cens

Numeric. Proportion censored (0-1). Default: 0.3

k_avg

Log of hr_threshold

k_ind

Log of hr_consistency

method

Character. Integration method: "cubature" (recommended for accuracy) or "monte_carlo" (faster for exploration). Default: "cubature"

n_mc

Integer. Number of Monte Carlo samples if method = "monte_carlo". Default: 100000

tol

Numeric. Relative tolerance for cubature integration. Default: 1e-4

Value

Numeric probability

Compute and Attach CDE Values to a DGM Object

Description

Calculates Controlled Direct Effect (CDE) hazard ratios from the super-population potential outcomes (theta_0, theta_1) and attaches them to the DGM's hazard_ratios list. This enables automatic CDE detection by build_estimation_table.

Usage

compute_dgm_cde(dgm, harm_col = NULL)

Arguments

dgm

A DGM object (e.g., from create_gbsg_dgm or generate_aft_dgm_flex). Must contain df_super_rand with columns theta_0 and theta_1.

harm_col

Character. Name of the subgroup indicator column in dgm$df_super_rand. If NULL (default), auto-detected from flag.harm, flag_harm, or H.

Details

The CDE for subgroup S is defined as:

CDE(S) = mean(exp(theta_1[S])) / mean(exp(theta_0[S]))

which is the ratio of average hazard contributions on the natural scale. This differs from the AHR (exp(mean(loghr_po))) due to Jensen's inequality. In the notation of Leon et al. (2024), CDE corresponds to theta-ddagger.

The function detects the subgroup indicator column automatically, checking for flag.harm, flag_harm, and H in the super-population data frame.

Value

The DGM object with CDE values added to dgm$hazard_ratios (CDE, CDE_harm, CDE_no_harm) and to top-level fields (dgm$CDE, dgm$cde_H, dgm$cde_Hc).

Examples


dgm <- setup_gbsg_dgm(model = "alt", k_inter = 2.0, verbose = FALSE)
dgm <- compute_dgm_cde(dgm)
dgm$hazard_ratios$CDE_harm   # theta-ddagger(H)
dgm$hazard_ratios$CDE         # theta-ddagger overall

Compute node metrics for a policy tree

Description

Aggregates scores by leaf node and calculates treatment effect differences

Usage

compute_node_metrics(data, dr.scores, tree, X, n.min)

Arguments

data

Data frame. Original data

dr.scores

Matrix. Doubly robust scores

tree

Policy tree object

X

Matrix. Covariate matrix

n.min

Integer. Minimum subgroup size

Value

Data frame with node metrics

Compute Hazard Ratio for a Single Subgroup

Description

Internal helper function to compute HR and CI for a subgroup. Uses robust (sandwich) standard errors for consistency with cox_summary().

Usage

compute_sg_hr(
  df,
  sg_name,
  outcome.name,
  event.name,
  treat.name,
  E.name,
  C.name,
  z_alpha = qnorm(0.975),
  conf.level = 0.95
)

Arguments

df

Data frame for the subgroup.

sg_name

Character. Name of the subgroup.

outcome.name

Character. Name of survival time variable.

event.name

Character. Name of event indicator variable.

treat.name

Character. Name of treatment variable.

E.name

Character. Label for experimental arm.

C.name

Character. Label for control arm.

z_alpha

Numeric. Z-multiplier for CI (default: qnorm(0.975) for 95% CI).

conf.level

Numeric. Confidence level for intervals (default: 0.95).

Value

Data frame with single row of HR estimates, or NULL if model fails.

Compute Hazard Ratio Estimates for Subgroups

Description

Internal function to compute Cox model hazard ratio estimates with confidence intervals for ITT, H, and Hc subgroups.

Usage

compute_sg_hr_estimates(
  df,
  df_H,
  df_Hc,
  outcome.name,
  event.name,
  treat.name,
  conf.level = 0.95,
  verbose = FALSE
)

Arguments

df

Full analysis data frame

df_H

Data frame for H subgroup

df_Hc

Data frame for Hc subgroup

outcome.name

Character. Outcome variable name

event.name

Character. Event indicator name

treat.name

Character. Treatment variable name

conf.level

Numeric. Confidence level

verbose

Logical. Print messages

Value

Data frame with HR estimates

Compute Summary Statistics for Subgroups

Description

Internal function to compute summary statistics for each subgroup.

Usage

compute_sg_summary(
  df,
  df_H,
  df_Hc,
  outcome.name,
  event.name,
  treat.name,
  sg0_name,
  sg1_name
)

Arguments

df

Full analysis data frame

df_H

Data frame for H subgroup

df_Hc

Data frame for Hc subgroup

outcome.name

Character. Outcome variable name

event.name

Character. Event indicator name

treat.name

Character. Treatment variable name

sg0_name

Character. Label for H subgroup

sg1_name

Character. Label for Hc subgroup

Value

Data frame with summary statistics

Count ID Occurrences in Bootstrap Sample

Description

Counts the number of times an ID appears in a bootstrap sample.

Usage

count_boot_id(x, dfb)

Arguments

x

ID value.

dfb

Data frame of bootstrap sample.

Value

Integer count of occurrences.

Comprehensive Wrapper for Cox Spline Analysis with AHR and CDE Plotting

Description

This wrapper function combines Cox spline fitting with comprehensive visualization of Average Hazard Ratios (AHRs) and Controlled Direct Effects (CDEs) as described in the MRCT subgroups analysis documentation.

Usage

cox_ahr_cde_analysis(
  df,
  tte_name = "os_time",
  event_name = "os_event",
  treat_name = "treat",
  z_name = "biomarker",
  loghr_po_name = "loghr_po",
  theta1_name = "theta_1",
  theta0_name = "theta_0",
  spline_df = 3,
  alpha = 0.2,
  hr_threshold = 0.7,
  plot_style = c("combined", "separate", "grid"),
  plot_select = c("all", "profile_ahr", "ahr_only"),
  save_plots = FALSE,
  output_dir = tempdir(),
  verbose = TRUE
)

Arguments

df

Data frame containing survival data with potential outcomes.

tte_name

Character string specifying time-to-event variable name. Default: "os_time".

event_name

Character string specifying event indicator variable name. Default: "os_event".

treat_name

Character string specifying treatment variable name. Default: "treat".

z_name

Character string specifying continuous covariate/biomarker name. Default: "biomarker".

loghr_po_name

Character string specifying potential outcome log HR variable. Default: "loghr_po".

theta1_name

Optional: variable name for theta_1 (treated potential outcome). Default: "theta_1".

theta0_name

Optional: variable name for theta_0 (control potential outcome). Default: "theta_0".

spline_df

Integer degrees of freedom for spline fitting. Default: 3.

alpha

Numeric significance level for confidence intervals. Default: 0.20.

hr_threshold

Numeric hazard ratio threshold for subgroup identification, or NULL to suppress threshold-based elements. When NULL, plots omit the HR threshold line, optimal cutpoint line, and subgroup-dependent panels (Plots 5–6); subgroup statistics report only the overall population. Default: 0.7.

plot_style

Character: "combined", "separate", or "grid" for plot layout. Default: "combined".

plot_select

Character controlling which panels to display: "all" (default) shows the full grid layout; "profile_ahr" shows only the treatment effect profile (top-left) and the AHR curve for z \geq threshold (top-middle) side by side, with the AHR panel's y-axis scaled to the data range; "ahr_only" shows only the AHR curve for z \geq threshold as a single panel with self-contained y-axis.

save_plots

Logical whether to save plots to file. Default: FALSE.

output_dir

Character directory for saving plots. Default: tempdir().

verbose

Logical for diagnostic output. Default: TRUE.

Value

List of class "cox_ahr_cde" containing:

cox_fit: Results from cox_cs_fit function.
ahr_results: AHR calculations for different subgroup definitions.
cde_results: CDE calculations if theta variables available.
optimal_cutpoint: Optimal biomarker cutpoint, or NULL when hr_threshold is NULL.
subgroup_stats: Statistics for recommended and questionable subgroups, or overall-only when hr_threshold is NULL.
data: List with z_values, loghr_po, and subgroup assignments.

Examples


# Build a small synthetic dataset with required columns
set.seed(42)
n <- 200
df_ex <- data.frame(
  os_time   = rexp(n, rate = 0.01),
  os_event  = rbinom(n, 1, 0.6),
  treat     = rep(0:1, each = n / 2),
  biomarker = rnorm(n),
  loghr_po  = rnorm(n, mean = -0.3, sd = 0.5)
)

# With threshold - full subgroup analysis
results <- cox_ahr_cde_analysis(
  df = df_ex, z_name = "biomarker",
  hr_threshold = 1.25, plot_style = "grid",
  verbose = FALSE
)

# Without threshold - pure AHR curves
results <- cox_ahr_cde_analysis(
  df = df_ex, z_name = "biomarker",
  hr_threshold = NULL, plot_select = "ahr_only",
  verbose = FALSE
)

Fit Cox Model with Cubic Spline for Treatment Effect Heterogeneity

Description

Estimates treatment effects as a function of a continuous covariate using a Cox proportional hazards model with natural cubic splines. The function models treatment-by-covariate interactions to detect effect modification.

Usage

cox_cs_fit(
  df,
  tte_name = "os_time",
  event_name = "os_event",
  treat_name = "treat",
  strata_name = NULL,
  z_name = "bm",
  alpha = 0.2,
  spline_df = 3,
  z_max = Inf,
  z_by = 1,
  z_window = 0,
  z_quantile = 0.9,
  show_plot = TRUE,
  plot_params = NULL,
  truebeta_name = NULL,
  verbose = TRUE
)

Arguments

df

Data frame containing survival data

tte_name

Character string specifying time-to-event variable name. Default: "os_time"

event_name

Character string specifying event indicator variable name (1=event, 0=censored). Default: "os_event"

treat_name

Character string specifying treatment variable name (1=treated, 0=control). Default: "treat"

strata_name

Character string specifying stratification variable name. If NULL, no stratification is used. Default: NULL

z_name

Character string specifying continuous covariate name for effect modification. Default: "bm"

alpha

Numeric value for confidence level (two-sided). Default: 0.20 (80% confidence intervals)

spline_df

Integer specifying degrees of freedom for natural spline. Default: 3

z_max

Numeric maximum value for z in predictions. Values beyond this are truncated. Default: Inf (no truncation)

z_by

Numeric increment for z values in prediction grid. Default: 1

z_window

Numeric half-width for counting observations near each z value. Default: 0.0 (exact matches only)

z_quantile

Numeric quantile (0-1) for upper limit of z profile. Default: 0.90 (90th percentile)

show_plot

Logical indicating whether to display plot. Default: TRUE

plot_params

List of plotting parameters (see Details). Default: NULL

truebeta_name

Character string specifying variable containing true log(HR) values for validation/simulation. Default: NULL

verbose

Logical indicating whether to print diagnostic information. Default: TRUE

Details

Model Structure

The function fits:

h(t|Z,A) = h_0(t) \exp(\beta_0 A + f(Z) + g(Z) \cdot A)

Where:

A is treatment (0/1)
Z is the continuous effect modifier
f(Z) is modeled with natural splines (main effect)
g(Z) is modeled with natural splines (interaction)
The log hazard ratio is: \beta(Z) = \beta_0 + g(Z)

Plot Parameters

The plot_params argument accepts a list with:

xlab: x-axis label
main_title: plot title
ylimit: y-axis limits c(min, max)
y_pad_zero: padding below zero line
y_delta: extra space for count labels
cex_legend: legend text size
cex_count: count text size
show_cox_primary: show standard Cox estimate line
show_null: show null effect line (log(HR)=0)
show_target: show target effect line (e.g., log(0.80))

Value

List containing:

z_profile: Vector of z values where treatment effect is estimated
loghr_est: Point estimates of log(HR) at each z value
loghr_lower: Lower confidence bound
loghr_upper: Upper confidence bound
se_loghr: Standard errors of log(HR) estimates
counts_profile: Number of observations near each z value
cox_primary: Log(HR) from standard Cox model (no interaction)
model_fit: The fitted coxph model object
spline_basis: The natural spline basis object

Examples


# Simulate data
set.seed(123)
df <- data.frame(
  os_time = rexp(500, 0.01),
  os_event = rbinom(500, 1, 0.7),
  treat = rbinom(500, 1, 0.5),
  bm = rnorm(500, 50, 10)
)

# Fit model
result <- cox_cs_fit(df, z_name = "bm", alpha = 0.20)

# Custom plotting
result <- cox_cs_fit(
  df,
  z_name = "bm",
  plot_params = list(
    xlab = "Biomarker Level",
    main_title = "Treatment Effect by Biomarker",
    cex_legend = 1.2
  )
)

Cox model summary for subgroup (OPTIMIZED)

Description

Called in analyze_subgroup() <– SG_tab_estimates

Usage

cox_summary(
  Y,
  E,
  Treat,
  Strata = NULL,
  use_strata = !is.null(Strata),
  return_format = c("formatted", "numeric")
)

Arguments

Y

Numeric vector of outcome.

E

Numeric vector of event indicators.

Treat

Numeric vector of treatment indicators.

Strata

Vector of strata (optional).

use_strata

Logical. Whether to use strata in the model (default: TRUE if Strata provided).

return_format

Character. "formatted" (default) or "numeric" for downstream use.

Details

Calculates hazard ratio and confidence interval for a subgroup using Cox regression. Optimized version with reduced overhead and better error handling.

Value

Character string with formatted HR and CI (or numeric vector if return_format="numeric").

Examples


library(survival)
cox_summary(
  Y     = gbsg$rfstime / 30.4375,
  E     = gbsg$status,
  Treat = gbsg$hormon
)

Batch Cox summaries with caching

Description

For repeated calls with the same data structure but different subsets, this version pre-processes the data structure once.

Usage

cox_summary_batch(
  Y,
  E,
  Treat,
  Strata = NULL,
  subset_indices,
  return_format = c("formatted", "numeric")
)

Arguments

Y

Numeric vector of outcome (full dataset).

E

Numeric vector of event indicators (full dataset).

Treat

Numeric vector of treatment indicators (full dataset).

Strata

Vector of strata (optional, full dataset).

subset_indices

List of integer vectors, each defining a subset to analyze.

return_format

Character. "formatted" or "numeric".

Value

List of results, one per subset.

Cox model summary for subgroup - vectorized version

Description

Efficiently processes multiple subgroups at once. Useful when analyzing many subgroups (e.g., in cross-validation).

Usage

cox_summary_vectorized(
  data,
  outcome_col,
  event_col,
  treat_col,
  strata_col = NULL,
  subgroup_col = "subgroup",
  return_format = c("formatted", "numeric")
)

Arguments

data

Data frame with columns for Y, E, Treat, and optionally Strata.

outcome_col

Character. Name of outcome column.

event_col

Character. Name of event column.

treat_col

Character. Name of treatment column.

strata_col

Character. Name of strata column (optional).

subgroup_col

Character. Name of subgroup indicator column.

return_format

Character. "formatted" or "numeric".

Value

Data frame with one row per subgroup and HR results.

Calculate Bootstrap Table Caption

Description

Generates an interpretive caption for bootstrap results table.

Usage

create_bootstrap_caption(est.scale, nb_boots, boot_success_rate)

Arguments

est.scale

Character. "hr" or "1/hr"

nb_boots

Integer. Number of bootstrap iterations

boot_success_rate

Numeric. Proportion successful

Value

Character string with caption

Create Bootstrap Diagnostic Plots

Description

Generates diagnostic visualization plots for bootstrap analysis.

Usage

create_bootstrap_diagnostic_plots(
  results,
  H_estimates,
  Hc_estimates,
  overall_timing = NULL
)

Arguments

results

Data frame with bootstrap results

H_estimates

List with H subgroup estimates

Hc_estimates

List with Hc subgroup estimates

overall_timing

List with overall timing information (optional)

Value

List of ggplot2 objects

Create Data Generating Mechanism for MRCT Simulations

Description

Wrapper function to create a data generating mechanism (DGM) for MRCT simulation scenarios using generate_aft_dgm_flex.

Usage

create_dgm_for_mrct(
  df_case,
  model_type = c("alt", "null"),
  log_hrs = NULL,
  confounder_var = NULL,
  confounder_effect = NULL,
  include_regA = TRUE,
  verbose = FALSE
)

Arguments

df_case

Data frame containing case study data

model_type

Character. Either "alt" (alternative hypothesis with heterogeneous treatment effects) or "null" (uniform treatment effect)

log_hrs

Numeric vector. Log hazard ratios for spline specification. If NULL, defaults are used based on model_type

confounder_var

Character. Name of a confounder variable to include with a forced prognostic effect. Default: NULL (no forced effect)

confounder_effect

Numeric. Log hazard ratio for confounder_var effect. Only used if confounder_var is specified

include_regA

Logical. Include regA as a factor in the model. Default: TRUE

verbose

Logical. Print detailed output. Default: FALSE

Details

Model Types

alt: Alternative hypothesis: Treatment effect varies by biomarker level (heterogeneous treatment effect). Default log_hrs create HR ranging from 2.0 (harm) to 0.5 (benefit) across biomarker range
null: Null hypothesis: Uniform treatment effect regardless of biomarker level. Default log_hrs = log(0.7) uniformly

Confounder Effects

By default, NO prognostic confounder effect is forced. The confounder_var and confounder_effect parameters allow optionally specifying ANY baseline covariate to have a fixed prognostic effect in the outcome model.

The regA variable (region indicator) is included as a factor by default but without a forced effect - its coefficient is estimated from data.

Value

An object of class "aft_dgm_flex" for use with simulate_from_dgm and mrct_region_sims

Create Factor Summary Tables from Bootstrap Results

Description

Generates formatted GT tables summarizing factor frequencies from bootstrap subgroup analysis. Creates two complementary tables: one showing factor selection frequencies within each position (M.1, M.2, etc.), and another showing overall factor frequencies across all positions.

Usage

create_factor_summary_tables(factor_freq, n_found, min_percent = 2)

Arguments

factor_freq

Data.frame or data.table. Factor frequency table from summarize_bootstrap_subgroups, containing columns:

Position: Position identifier (e.g., "M.1", "M.2")
Factor: Factor definition string
N: Count of times factor appeared in this position
Percent: Percentage relative to times position was populated

n_found

Integer. Number of successful bootstrap iterations (where a subgroup was identified). Used to calculate overall percentages.

min_percent

Numeric. Minimum percentage threshold for including factors in the tables. Factors with selection frequencies below this threshold are excluded. Default is 2 (i.e., 2%).

Value

A list with up to two GT table objects:

by_position: GT table showing factor frequencies within each position. Percentages represent conditional probability of factor selection given that the position was populated. Within each position, percentages sum to approximately 100% (may not sum exactly to 100% after filtering).
overall: GT table showing total factor frequencies across all positions. Includes additional columns indicating which positions each factor appeared in and how many unique positions used the factor. Percentages represent proportion of successful iterations where the factor appeared in any position.

If no factors meet the minimum threshold, the corresponding table element will be NULL.

Note

This function requires the gt package for table creation. The overall table also requires dplyr for data aggregation. If dplyr is not available, only the position-specific table will be created and the overall element will be NULL.

Always check for NULL before using the returned tables:

if (!is.null(factor_tables$by_position)) {
  print(factor_tables$by_position)
}

If all factors have percentages below min_percent, both table elements will be NULL.

Create Forest Plot Theme with Size Controls

Description

Creates a forestploter theme with parameters that control overall plot sizing and appearance. This is the primary way to control how large the forest plot renders.

Usage

create_forest_theme(
  base_size = 10,
  scale = 1,
  row_padding = NULL,
  ci_pch = 15,
  ci_lwd = NULL,
  ci_Theight = NULL,
  ci_col = "black",
  header_fontsize = NULL,
  body_fontsize = NULL,
  footnote_fontsize = NULL,
  footnote_col = "darkcyan",
  title_fontsize = NULL,
  cv_fontsize = NULL,
  cv_col = "gray30",
  refline_lwd = NULL,
  refline_lty = "dashed",
  refline_col = "gray30",
  vertline_lwd = NULL,
  vertline_lty = "dashed",
  vertline_col = "gray20",
  arrow_type = "closed",
  arrow_col = "black",
  summary_fill = "black",
  summary_col = "black"
)

Arguments

base_size

Numeric. Base font size in points. This is the primary scaling parameter - increasing it will proportionally scale all fonts, row padding, and line widths. Default: 10.

scale

Numeric. Additional scaling multiplier applied on top of base_size. Use for quick overall scaling. Default: 1.0.

row_padding

Numeric vector of length 2. Padding around row content in mm as c(vertical, horizontal). If NULL, auto-calculated from base_size. Default: NULL.

ci_pch

Integer. Point character for CI. 15=square, 16=circle, 18=diamond. Default: 15.

ci_lwd

Numeric. Line width for CI lines. If NULL, auto-calculated from base_size. Default: NULL.

ci_Theight

Numeric. Height of T-bar ends on CI. If NULL, auto-calculated from base_size. Default: NULL.

ci_col

Character. Color for CI lines and points. Default: "black".

header_fontsize

Numeric. Font size for column headers. If NULL, auto-calculated as base_size * scale + 1. Default: NULL.

body_fontsize

Numeric. Font size for body text. If NULL, auto-calculated as base_size * scale. Default: NULL.

footnote_fontsize

Numeric. Font size for footnotes. If NULL, auto-calculated as base_size * scale - 1. Default: NULL.

footnote_col

Character. Color for footnote text. Default: "darkcyan".

title_fontsize

Numeric. Font size for title. If NULL, auto-calculated as base_size * scale + 4. Default: NULL.

cv_fontsize

Numeric. Font size for CV annotation text. If NULL, auto-calculated as base_size * scale. Default: NULL.

cv_col

Character. Color for CV annotation text. Default: "gray30".

refline_lwd

Numeric. Reference line width. If NULL, auto-calculated. Default: NULL.

refline_lty

Character. Reference line type. Default: "dashed".

refline_col

Character. Reference line color. Default: "gray30".

vertline_lwd

Numeric. Vertical line width. If NULL, auto-calculated. Default: NULL.

vertline_lty

Character. Vertical line type. Default: "dashed".

vertline_col

Character. Vertical line color. Default: "gray20".

arrow_type

Character. Arrow type: "open" or "closed". Default: "closed".

arrow_col

Character. Arrow color. Default: "black".

summary_fill

Character. Fill color for summary diamonds. Default: "black".

summary_col

Character. Border color for summary diamonds. Default: "black".

Details

The base_size parameter is the primary way to control plot size. When you change base_size, the following are automatically scaled:

All font sizes (body, header, footnote, CV, title)
Row padding (vertical and horizontal)
CI line width and T-bar height
Reference and vertical line widths

The scaling formula uses base_size = 10 as the reference point:

base_size = 10: Default sizing
base_size = 12: 20% larger
base_size = 14: 40% larger
base_size = 16: 60% larger

You can override any individual parameter by specifying it explicitly.

The theme does NOT set row background colors - those are determined automatically by plot_subgroup_results_forestplot() based on row types (ITT, reference, posthoc, etc.).

Value

A list of class "fs_forest_theme" containing all theme parameters.

Examples


# Simple: just increase base_size for larger plot
large_theme <- create_forest_theme(base_size = 14)
print(large_theme)

# Or use scale for quick adjustment
large_theme <- create_forest_theme(base_size = 10, scale = 1.4)

# Fine-tune specific elements
custom_theme <- create_forest_theme(
  base_size = 14,
  cv_fontsize = 12,
  ci_lwd = 2.5
)

Create Subgroup Indicator Columns from ForestSearch

Description

Internal helper to create Qrecommend and Brecommend indicator columns.

Usage

create_fs_subgroup_indicators(
  df,
  fs.est,
  col_names = c("Qrecommend", "Brecommend"),
  verbose = FALSE
)

Arguments

df

Data frame to modify.

fs.est

A forestsearch object.

col_names

Character vector of length 2. Names for the indicator columns: first for harm/questionable (treat.recommend == 0), second for benefit/recommend (treat.recommend == 1). Default: c("Qrecommend", "Brecommend")

verbose

Logical. Print diagnostic messages.

Value

Modified data frame with indicator columns.

Create GBSG-Based AFT Data Generating Mechanism

Description

Creates a data generating mechanism (DGM) for survival simulations based on the German Breast Cancer Study Group (GBSG) dataset. Supports heterogeneous treatment effects via treatment-subgroup interactions.

Usage

create_gbsg_dgm(
  model = c("alt", "null"),
  k_treat = 1,
  k_inter = 1,
  k_z3 = 1,
  z1_quantile = 0.25,
  n_super = DEFAULT_N_SUPER,
  cens_type = c("weibull", "uniform"),
  use_rand_params = FALSE,
  seed = SEED_BASE,
  verbose = FALSE
)

Arguments

model

Character. Either "alt" for alternative hypothesis with heterogeneous treatment effects, or "null" for uniform treatment effect. Default: "alt"

k_treat

Numeric. Treatment effect multiplier applied to the treatment coefficient from the fitted AFT model. Values > 1 strengthen the treatment effect. Default: 1

k_inter

Numeric. Interaction effect multiplier for the treatment-subgroup interaction (z1 * z3). Only used when model = "alt". Higher values create more heterogeneity between HR(H) and HR(Hc). Default: 1

k_z3

Numeric. Effect multiplier for the z3 (menopausal status) coefficient. Default: 1

z1_quantile

Numeric. Quantile threshold for z1 (estrogen receptor). Observations with ER <= quantile are coded as z1 = 1. Default: 0.25

n_super

Integer. Size of super-population for empirical HR estimation. Default: 5000

cens_type

Character. Censoring distribution type: "weibull" or "uniform". Default: "weibull"

use_rand_params

Logical. If TRUE, modifies confounder coefficients using estimates from randomized subset (meno == 0). Default: FALSE

seed

Integer. Random seed for super-population generation. Default: 8316951

verbose

Logical. Print diagnostic information. Default: FALSE

Details

This version is aligned with generate_aft_dgm_flex() and calculate_hazard_ratios() methodology, computing individual-level potential outcomes and average hazard ratios (AHR).

Subgroup Definition

The harm subgroup H is defined as: z1 = 1 AND z3 = 1, where:

z1: Low estrogen receptor (ER <= 25th percentile by default)
z3: Premenopausal status (meno == 0)

Model Specification

The AFT model uses covariates: treat, z1, z2, z3, z4, z5, and (for "alt") the interaction zh = treat * z1 * z3.

Interaction Effect (k_inter)

The k_inter parameter modifies the zh coefficient in the AFT model:

gamma[zh] <- k_inter * gamma[zh]

This affects the hazard ratio for the harm subgroup:

HR(H) = exp(-gamma[treat]/sigma - gamma[zh]/sigma)
HR(Hc) = exp(-gamma[treat]/sigma)

When k_inter = 0, HR(H) = HR(Hc) (no heterogeneity).

Alignment with generate_aft_dgm_flex

This function now computes:

theta_0: Log-hazard contribution under control
theta_1: Log-hazard contribution under treatment
loghr_po: Individual causal log hazard ratio (theta_1 - theta_0)
AHR metrics: exp(mean(loghr_po)) for overall and subgroups

Value

A list of class "gbsg_dgm" containing:

df_super_rand: Data frame with randomized super-population including potential outcomes (theta_0, theta_1, loghr_po)
hr_H_true: Empirical hazard ratio in harm subgroup (Cox-based)
hr_Hc_true: Empirical hazard ratio in complement subgroup (Cox-based)
hr_causal: Overall causal (ITT) hazard ratio (Cox-based)
AHR: Overall average hazard ratio (from loghr_po)
AHR_H_true: Average hazard ratio in harm subgroup
AHR_Hc_true: Average hazard ratio in complement subgroup
hazard_ratios: List matching generate_aft_dgm_flex output format
model_params: List with AFT model parameters (mu, sigma, gamma, etc.)
cens_params: List with censoring model parameters
subgroup_info: List with subgroup definitions and true factor names
analysis_vars: Character vector of analysis variable names
model_type: Character indicating "alt" or "null"

Helper Functions for GRF Subgroup Analysis

Description

This file contains helper functions used by grf.subg.harm.survival() to improve readability and modularity. Create GRF configuration object

Usage

create_grf_config(
  frac.tau,
  n.min,
  dmin.grf,
  RCT,
  sg.criterion,
  maxdepth,
  seedit
)

Arguments

frac.tau

Numeric. Fraction of tau for GRF horizon

n.min

Integer. Minimum subgroup size

dmin.grf

Numeric. Minimum difference in subgroup mean

RCT

Logical. Is the data from a randomized controlled trial?

sg.criterion

Character. Subgroup selection criterion

maxdepth

Integer. Maximum tree depth

seedit

Integer. Random seed

Details

Creates a configuration object to organize GRF parameters

Value

List with configuration parameters

Create result object when no subgroup is found

Description

Builds result object for cases where no valid subgroup is identified

Usage

create_null_result(data, values, trees, config)

Arguments

data

Data frame. Original data

values

Data frame. Node metrics (may be empty)

trees

List. Fitted policy trees

config

List. GRF configuration

Value

List with limited GRF results

Create Reference Subgroup Indicator Columns

Description

Creates indicator columns (0/1) in the data frame for each reference subgroup based on the provided subset expressions.

Usage

create_reference_subgroup_columns(df, ref_subgroups, verbose = FALSE)

Arguments

df

Data frame to modify.

ref_subgroups

Named list of reference subgroup definitions. Each element should have subset_expr and optionally label and color.

verbose

Logical. Print diagnostic messages.

Value

List with modified df, cols, labels, and colors vectors.

Create Result Row

Description

Create Result Row

Usage

create_result_row(kk, covs.in, nx, event_counts, cox_result)

Create Sample Size Table for Multiple Scenarios

Description

Generates a table of required sample sizes for different combinations of true hazard ratios and censoring proportions.

Usage

create_sample_size_table(
  theta_values,
  prop_cens_values,
  target_power = 0.8,
  hr_threshold = 1.25,
  verbose = TRUE
)

Arguments

theta_values

Numeric vector. True hazard ratios to evaluate.

prop_cens_values

Numeric vector. Censoring proportions to evaluate.

target_power

Numeric. Target detection probability. Default: 0.80

hr_threshold

Numeric. HR threshold. Default: 1.25

verbose

Logical. Print progress. Default: TRUE

Value

A data.frame with columns: theta, prop_cens, n_required, achieved_power

Create Spline Variables

Description

Create Spline Variables

Usage

create_spline_variables(df_work, spline_var, knot)

Create Subgroup Indicator from Factor Definitions

Description

Parses factor definitions (e.g., "v1.1", "grade3.1") and creates a binary indicator for subgroup membership.

Usage

create_subgroup_indicator(df, sg_factors)

Arguments

df

Data frame containing the variables

sg_factors

Character vector of factor definitions

Value

Integer vector (1 = in subgroup, 0 = not in subgroup)

Create Subgroup Summary Data Frame for Forest Plot

Description

Creates a data frame suitable for forestploter from multiple subgroup analyses. This is a more flexible alternative for complex subgroup configurations.

Usage

create_subgroup_summary_df(
  df_analysis,
  subgroups,
  outcome.name,
  event.name,
  treat.name,
  E.name = "E",
  C.name = "C",
  fs_bc_list = NULL,
  fs_kfold_list = NULL,
  conf.level = 0.95
)

Arguments

df_analysis

Data frame. The analysis dataset.

subgroups

Named list of subgroup definitions.

outcome.name

Character. Name of survival time variable.

event.name

Character. Name of event indicator variable.

treat.name

Character. Name of treatment variable.

E.name

Character. Label for experimental arm.

C.name

Character. Label for control arm.

fs_bc_list

List. Named list of bootstrap results for each subgroup.

fs_kfold_list

List. Named list of k-fold results for each subgroup.

conf.level

Numeric. Confidence level for intervals (default: 0.95).

Value

Data frame with HR estimates for all subgroups.

Create result object for successful subgroup identification

Description

Builds comprehensive result object when a subgroup is found

Usage

create_success_result(
  data,
  best_subgroup,
  trees,
  tree_cuts,
  selected_tree,
  sg_harm_id,
  values,
  config
)

Arguments

data

Data frame. Original data with subgroup assignments

best_subgroup

Data frame row. Selected subgroup information

trees

List. All fitted policy trees

tree_cuts

List. Cut information from trees

selected_tree

Policy tree. The tree that identified the subgroup

sg_harm_id

Character. Expression defining the subgroup

values

Data frame. All node metrics

config

List. GRF configuration

Value

List with complete GRF results

Create Enhanced Summary Table for Baseline Characteristics

Description

Generates a formatted summary table comparing baseline characteristics between treatment arms. Supports continuous, categorical, and binary variables with p-values, standardized mean differences (SMD), and missing data summaries.

Usage

create_summary_table(
  data,
  treat_var = "treat",
  vars_continuous = NULL,
  vars_categorical = NULL,
  vars_binary = NULL,
  var_labels = NULL,
  digits = 1,
  show_pvalue = TRUE,
  show_smd = TRUE,
  show_missing = TRUE,
  table_title = "Baseline Characteristics by Treatment Arm",
  table_subtitle = NULL,
  source_note = NULL,
  font_size = 12,
  header_font_size = 14,
  footnote_font_size = 10,
  use_alternating_rows = TRUE,
  stripe_color = "#f9f9f9",
  indent_size = 20,
  highlight_pval = 0.05,
  highlight_smd = 0.2,
  highlight_color = "#fff3cd",
  compact_mode = FALSE,
  column_width_var = 200,
  column_width_stats = 120,
  show_column_borders = FALSE,
  custom_css = NULL
)

Arguments

data

Data frame containing the analysis data

treat_var

Character. Name of treatment variable (must have 2 levels)

vars_continuous

Character vector. Names of continuous variables

vars_categorical

Character vector. Names of categorical variables

vars_binary

Character vector. Names of binary (0/1) variables

var_labels

Named list. Custom labels for variables (optional)

digits

Integer. Number of decimal places for continuous variables

show_pvalue

Logical. Include p-values column

show_smd

Logical. Include SMD (effect size) column

show_missing

Logical. Include missing data rows

table_title

Character. Main title for the table

table_subtitle

Character. Subtitle for the table (optional)

source_note

Character. Source note at bottom (optional)

font_size

Numeric. Base font size in pixels (default: 12)

header_font_size

Numeric. Header font size in pixels (default: 14)

footnote_font_size

Numeric. Footnote font size in pixels (default: 10)

use_alternating_rows

Logical. Apply zebra striping (default: TRUE)

stripe_color

Character. Color for alternating rows (default: "#f9f9f9")

indent_size

Numeric. Indentation for sub-levels in pixels (default: 20)

highlight_pval

Numeric. Highlight p-values below this threshold (default: 0.05)

highlight_smd

Numeric. Highlight SMD values above this threshold (default: 0.2)

highlight_color

Character. Color for highlighting (default: "#fff3cd")

compact_mode

Logical. Reduce spacing for compact display (default: FALSE)

column_width_var

Numeric. Width for Variable column in pixels (default: 200)

column_width_stats

Numeric. Width for stat columns in pixels (default: 120)

show_column_borders

Logical. Show vertical column borders (default: FALSE)

custom_css

Character. Additional custom CSS styling (optional)

Details

Binary variables specified via vars_binary display a single row showing the count and proportion for the "1" level. Categorical variables specified via vars_categorical that happen to be binary-coded (i.e., have exactly two levels: 0 and 1) are automatically detected and displayed in the same compact single-row format, showing only the "1" proportion.

Value

A gt table object (or data frame if gt not available)

Preset: Compact Table

Description

Preset: Compact Table

Usage

create_summary_table_compact(...)

Arguments

...

Arguments passed to create_summary_table()

Preset: Minimal Table (No Highlighting, No Alternating)

Description

Preset: Minimal Table (No Highlighting, No Alternating)

Usage

create_summary_table_minimal(...)

Arguments

...

Arguments passed to create_summary_table()

Preset: Presentation Table (Large Fonts)

Description

Preset: Presentation Table (Large Fonts)

Usage

create_summary_table_presentation(...)

Arguments

...

Arguments passed to create_summary_table()

Preset: Publication-Ready Table

Description

Preset: Publication-Ready Table

Usage

create_summary_table_publication(...)

Arguments

...

Arguments passed to create_summary_table()

Create Timing Summary Table

Description

Creates a data frame summarizing bootstrap timing information.

Usage

create_timing_summary_table(
  overall_timing,
  iteration_stats,
  fs_stats,
  overhead_stats,
  nb_boots,
  boot_success_rate
)

Arguments

overall_timing

List. Overall timing statistics

iteration_stats

List. Per-iteration timing statistics

fs_stats

List. ForestSearch-specific timing statistics

overhead_stats

List. Overhead timing statistics

nb_boots

Integer. Number of bootstrap iterations

boot_success_rate

Numeric. Proportion of successful bootstraps

Value

Data frame with timing summary

Discretize Continuous Variable into Quantile-Based Categories

Description

Discretize Continuous Variable into Quantile-Based Categories

Usage

cut_numeric(x, probs = c(0.25, 0.5, 0.75))

Arguments

x

Numeric vector to discretize

probs

Numeric vector of probabilities for quantile breaks. Default: c(0.25, 0.5, 0.75) creates quartiles coded as 1, 2, 3, 4

Value

Integer vector with category codes (1 = lowest, max = highest)

Discretize Continuous Variable by Size Categories

Description

Discretize Continuous Variable by Size Categories

Usage

cut_size(x, breaks = c(20, 50))

Arguments

x

Numeric vector (typically tumor size)

breaks

Numeric vector of breakpoints. Default: c(20, 50)

Value

Integer vector with category codes

Generate cut expressions for a variable

Description

For a continuous variable, returns expressions for mean, median, qlow, and qhigh cuts.

Usage

cut_var(x)

Arguments

x

Character. Variable name.

Value

Character vector of cut expressions.

Compare Multiple CV Results

Description

Creates a comparison table from multiple cross-validation runs with different configurations.

Usage

cv_compare_results(
  cv_list,
  metrics = c("all", "finding", "agreement"),
  show_percentages = TRUE,
  digits = 1,
  use_gt = TRUE
)

Arguments

cv_list

Named list of cv_result objects from forestsearch_tenfold() or forestsearch_Kfold().

metrics

Character vector. Which metrics to include. Options: "finding", "agreement", "all". Default: "all".

show_percentages

Logical. Display as percentages. Default: TRUE.

digits

Integer. Decimal places. Default: 1.

use_gt

Logical. Return gt table if TRUE. Default: TRUE.

Value

A gt table or data.frame comparing CV results across configurations.

Create Metrics Tables for Cross-Validation Results

Description

Formats the find_summary and sens_summary outputs from forestsearch_tenfold or forestsearch_Kfold into publication-ready gt tables.

Usage

cv_metrics_tables(
  cv_result,
  sg_definition = NULL,
  title = "Cross-Validation Metrics",
  show_percentages = TRUE,
  digits = 1,
  include_raw = FALSE,
  table_style = c("combined", "separate", "minimal"),
  use_gt = TRUE
)

Arguments

cv_result

List. Result from forestsearch_tenfold() or forestsearch_Kfold(). Must contain find_summary and sens_summary elements.

sg_definition

Character vector. Subgroup factor definitions for labeling (optional). If NULL, extracted from cv_result$sg_analysis.

title

Character. Main title for combined table. Default: "Cross-Validation Metrics".

show_percentages

Logical. Display metrics as percentages (0-100) instead of proportions (0-1). Default: TRUE.

digits

Integer. Decimal places for formatting. Default: 1.

include_raw

Logical. Include raw matrices (sens_out, find_out) in the output for detailed analysis. Default: FALSE.

table_style

Character. One of "combined", "separate", or "minimal".

"combined": Single table with both agreement and finding metrics
"separate": Two separate gt tables
"minimal": Compact single-row summary

Default: "combined".

use_gt

Logical. Return gt table(s) if TRUE, data.frame(s) if FALSE. Default: TRUE.

Value

Depending on table_style:

"combined": A single gt table (or data.frame)
"separate": A list with agreement_table and finding_table
"minimal": A single-row gt table (or data.frame)

If include_raw = TRUE, also includes sens_out and find_out matrices in the returned list.

Create Summary Tables from forestsearch_KfoldOut Results

Description

Formats the detailed output from forestsearch_KfoldOut(outall=TRUE) into publication-ready gt tables. This includes ITT estimates, original subgroup estimates, and K-fold subgroup estimates.

Usage

cv_summary_tables(
  kfold_out,
  title = "Cross-Validation Summary",
  subtitle = NULL,
  show_metrics = TRUE,
  digits = 3,
  font_size = 12,
  use_gt = TRUE
)

Arguments

kfold_out

List. Result from forestsearch_KfoldOut(res, outall = TRUE). Must contain itt_tab, SG_tab_original, SG_tab_Kfold, and optionally tab_all.

title

Character. Main title for combined table. Default: "Cross-Validation Summary".

subtitle

Character. Subtitle for table. Default: NULL (auto-generated).

show_metrics

Logical. Include agreement and finding metrics in output. Default: TRUE.

digits

Integer. Decimal places for numeric formatting. Default: 3.

font_size

Integer. Font size in pixels. Default: 12.

use_gt

Logical. Return gt table if TRUE, data.frame if FALSE. Default: TRUE.

Value

If use_gt = TRUE, returns a list with gt table objects:

combined_table: Combined ITT and subgroup estimates
itt_table: ITT estimates only
original_table: Original full-data subgroup estimates
kfold_table: K-fold subgroup estimates
metrics_table: Agreement and finding metrics (if show_metrics = TRUE)

If use_gt = FALSE, returns equivalent data.frames.

Create Compact CV Summary Text

Description

Generates a compact text string summarizing CV results, suitable for annotations in plots or reports.

Usage

cv_summary_text(
  cv_result,
  est.scale = "hr",
  include_finding = TRUE,
  include_agreement = TRUE
)

Arguments

cv_result

List. Result from forestsearch_tenfold() or forestsearch_Kfold().

est.scale

Character. "hr" or "1/hr" to determine label orientation. Default: "hr".

include_finding

Logical. Include subgroup finding rate. Default: TRUE.

include_agreement

Logical

Value

Character string with formatted CV metrics.

Default ForestSearch Parameters for GBSG Simulations

Description

Returns a list of default parameters for ForestSearch analysis in GBSG-based simulations.

Usage

default_fs_params()

Details

Default parameters are optimized for GBSG simulation scenarios with moderate sample sizes (300-1000) and typical event rates.

Variable selection defaults:

use_lasso = TRUE: LASSO-based variable importance (default for FS)
use_grf = FALSE: GRF-based variable importance (enable for FSlg)

The use_twostage parameter is set to FALSE by default for backward compatibility. Set to TRUE for faster exploratory analyses.

Value

List of default ForestSearch parameters

Default GRF Parameters for GBSG Simulations

Description

Returns a list of default parameters for GRF analysis in GBSG-based simulations. Parameters align with grf.subg.harm.survival() function signature.

Usage

default_grf_params()

Usage

density_threshold_both(x, theta, prop_cens = 0.3, n_sg, k_avg, k_ind)

Arguments

x

Numeric vector of length 2. Log hazard ratio estimates from the two split-halves.

theta

Numeric. True hazard ratio in the subgroup.

prop_cens

Numeric. Proportion censored (0-1). Default: 0.3

n_sg

Integer. Subgroup sample size.

k_avg

Numeric. Threshold for average log(HR) across splits. Typically log(hr.threshold).

k_ind

Numeric. Threshold for individual split log(HR). Typically log(hr.consistency).

Details

The detection criterion requires:

Average of two splits: (x1 + x2)/2 >= k_avg
Individual splits: x1 >= k_ind AND x2 >= k_ind

Under the asymptotic approximation, each split-half log(HR) estimator follows N(log(theta), 8/d) where d = n_sg * (1 - prop_cens) / 2 is the expected number of events per split.

Value

Numeric. Joint density value at x, or 0 if thresholds not met.

Vectorized Density for Integration

Description

Wrapper around density_threshold_both for use with cubature integration.

Usage

density_threshold_integrand(x, theta, prop_cens, n_sg, k_avg, k_ind)

Arguments

x

Numeric vector of length 2. Log hazard ratio estimates from the two split-halves.

theta

Numeric. True hazard ratio in the subgroup.

prop_cens

Numeric. Proportion censored (0-1). Default: 0.3

n_sg

Integer. Subgroup sample size.

k_avg

Numeric. Threshold for average log(HR) across splits. Typically log(hr.threshold).

k_ind

Numeric. Threshold for individual split log(HR). Typically log(hr.consistency).

Value

Numeric density value.

Automatically Detect Variable Types in a Dataset

Description

Analyzes a data frame to automatically classify variables as continuous or categorical, and returns a subset of the data with specified variables excluded.

Usage

detect_variable_types(data, max_unique_for_cat = 10, exclude_vars = NULL)

Arguments

data

A data frame to analyze

max_unique_for_cat

Integer. Maximum number of unique values for a numeric variable to be considered categorical. Default is 10.

exclude_vars

Character vector of variable names to exclude from both classification and the returned dataset (e.g., ID variables, timestamps). Default is NULL.

Details

The function classifies variables using the following rules:

Numeric variables with more than max_unique_for_cat unique values are classified as continuous
Numeric variables with max_unique_for_cat or fewer unique values are classified as categorical
Factor, character, and logical variables are always classified as categorical
Variables listed in exclude_vars are omitted from classification and removed from the returned dataset

Value

A list containing:

continuous_vars

Character vector of variable names classified as continuous

cat_vars

Character vector of variable names classified as categorical

data_subset

Data frame with exclude_vars columns removed

Dummy-code a data frame (numeric pass-through, factors expanded)

Description

Dummy-code a data frame (numeric pass-through, factors expanded)

Usage

dummy_encode(df)

Arguments

df

Data frame with numeric and/or factor columns.

Value

Data frame with numeric columns unchanged and factor columns expanded via acm.disjctif.

Early Stopping Decision

Description

Evaluates whether enough evidence exists to stop early based on confidence interval for consistency proportion.

Usage

early_stop_decision(
  n_success,
  n_total,
  threshold,
  conf.level = 0.95,
  min_samples = 20
)

Arguments

n_success

Integer. Number of splits meeting consistency.

n_total

Integer. Total number of valid splits.

threshold

Numeric. Target consistency threshold.

conf.level

Numeric. Confidence level for decision (default 0.95).

min_samples

Integer. Minimum samples before allowing early stop.

Value

Character. One of "continue", "pass", or "fail".

Evaluate a Single Factor Combination with Status Tracking

Description

Tests whether a specific combination meets all criteria and returns a status code indicating how far the evaluation progressed.

Usage

evaluate_combination_with_status(
  covs.in,
  yy,
  dd,
  tt,
  zz,
  n.min,
  d0.min,
  d1.min,
  hr.threshold,
  minp,
  rmin,
  kk
)

Arguments

covs.in

Numeric vector. Factor selection indicators.

yy

Numeric vector. Outcome values.

dd

Numeric vector. Event indicators.

tt

Numeric vector. Treatment indicators.

zz

Matrix. Factor indicators.

n.min

Integer. Minimum sample size.

d0.min

Integer. Minimum control events.

d1.min

Integer. Minimum treatment events.

hr.threshold

Numeric. HR threshold.

minp

Numeric. Minimum prevalence.

rmin

Integer. Minimum size reduction.

kk

Integer. Combination index.

Value

List with:

status: Integer status code: 0 = failed variance check, 1 = passed variance, failed prevalence, 2 = passed prevalence, failed redundancy, 3 = passed redundancy, failed events, 4 = passed events, failed sample size, 5 = passed sample size, failed Cox fit, 6 = passed Cox fit, failed HR threshold, 7 = passed all criteria (success)
result: Result row if successful, NULL otherwise

Evaluate a Comparison Expression Without eval(parse())

Description

Parses a string of the form "var op value" and evaluates it directly against a data frame column using operator dispatch. Falls back to column-name lookup for bare names.

Usage

evaluate_comparison(expr, df)

Arguments

expr

Character. An expression like "er <= 0", "size > 35", "grade3 == 1", or a bare column name like "q3.1".

df

Data frame whose columns are referenced by expr.

Details

Supported operators (matched longest-first to avoid partial-match ambiguity): <=, >=, !=, ==, <, >.

If no operator is found, expr is treated as a column name and the result is df[[expr]] == 1.

The value on the right-hand side is coerced to numeric when possible, otherwise kept as character for string comparisons.

Value

Logical vector of length nrow(df).

Evaluate Consistency (Two-Stage Algorithm)

Description

Evaluates a single subgroup for consistency using a two-stage approach: Stage 1 screens with fewer splits, Stage 2 uses sequential batched evaluation with early stopping for efficient evaluation.

Usage

evaluate_consistency_twostage(
  m,
  index.Z,
  names.Z,
  df,
  found.hrs,
  hr.consistency,
  pconsistency.threshold,
  pconsistency.digits = 2,
  maxk,
  confs_labels,
  details = FALSE,
  n.splits.screen = 30,
  screen.threshold = NULL,
  n.splits.max = 400,
  batch.size = 20,
  conf.level = 0.95,
  min.valid.screen = 10
)

Arguments

m

Integer. Index of subgroup to evaluate.

index.Z

data.table or matrix. Factor indicators for all subgroups.

names.Z

Character vector. Names of factor columns.

df

data.frame. Original data with Y, Event, Treat, id columns.

found.hrs

data.table. Subgroup hazard ratio results.

hr.consistency

Numeric. Minimum HR threshold for consistency.

pconsistency.threshold

Numeric. Final consistency threshold.

pconsistency.digits

Integer. Rounding digits for output.

maxk

Integer. Maximum number of factors in a subgroup.

confs_labels

Character vector. Labels for confounders.

details

Logical. Print progress details.

n.splits.screen

Integer. Number of splits for Stage 1 (default 30).

screen.threshold

Numeric. Screening threshold for Stage 1 (default auto-calculated).

n.splits.max

Integer. Maximum total splits (default 400).

batch.size

Integer. Splits per batch in Stage 2 (default 20).

conf.level

Numeric. Confidence level for early stopping (default 0.95).

min.valid.screen

Integer. Minimum valid splits in Stage 1 (default 10).

Value

Named numeric vector with consistency results, or NULL if not met.

Cache and validate cut expressions efficiently

Description

Evaluates all cut expressions once and caches results to avoid redundant evaluation. Much faster than evaluating repeatedly.

Usage

evaluate_cuts_once(confs, df, details = FALSE)

Arguments

confs

Character vector of cut expressions.

df

Data frame to evaluate expressions against.

details

Logical. Print details during execution.

Details

This replaces multiple eval(parse()) calls scattered throughout get_FSdata. By caching results, we avoid:

Repeated parsing of expressions
Repeated evaluation on dataframe
Redundant uniqueness checks

Value

List with:

evaluations: List of evaluated vectors (logical TRUE/FALSE) for each cut
is_valid: Logical vector indicating which cuts produced >1 unique value
has_error: Logical vector indicating which cuts failed to evaluate

Evaluate Single Subgroup for Consistency (Fixed-Sample)

Description

Evaluates a single subgroup for consistency across random splits using a fixed number of splits.

Usage

evaluate_subgroup_consistency(
  m,
  index.Z,
  names.Z,
  df,
  found.hrs,
  n.splits,
  hr.consistency,
  pconsistency.threshold,
  pconsistency.digits = 2,
  maxk,
  confs_labels,
  details = FALSE
)

Arguments

m

Integer. Index of the subgroup to evaluate.

index.Z

Data.table or matrix. Factor indicators for all subgroups.

names.Z

Character vector. Names of factor columns.

df

Data.frame. Original data with Y, Event, Treat, id columns.

found.hrs

Data.table. Subgroup hazard ratio results.

n.splits

Integer. Number of random splits for consistency evaluation.

hr.consistency

Numeric. Minimum HR threshold for consistency.

pconsistency.threshold

Numeric. Minimum proportion of splits meeting consistency.

pconsistency.digits

Integer. Rounding digits for consistency proportion.

maxk

Integer. Maximum number of factors in a subgroup.

confs_labels

Character vector. Labels for confounders.

details

Logical. Print details during execution.

Value

Named numeric vector with consistency results, or NULL if criteria not met.

Extract all cuts from fitted trees

Description

Consolidates cut information from all fitted policy trees. This is the default behavior that returns cuts from all trees regardless of which tree identified the selected subgroup.

Usage

extract_all_tree_cuts(trees, maxdepth)

Arguments

trees

List. Policy trees (indexed by depth)

maxdepth

Integer. Maximum tree depth

Value

List with cuts and names for each tree and combined

Extract Estimates from ForestSearch Results

Description

Extracts operating characteristics (HRs, classification metrics, etc.) from ForestSearch analysis results. Aligned with new DGM output structure.

Usage

extract_fs_estimates(
  df,
  fs_res,
  dgm,
  cox_formula = NULL,
  cox_formula_adj = NULL,
  analysis = "FS",
  fs_full = NULL,
  verbose = FALSE
)

Arguments

df

Simulated data frame

fs_res

ForestSearch result table (grp.consistency$out_sg$result, or NULL)

dgm

DGM object containing true HRs (supports both old and new formats)

cox_formula

Cox formula for estimation (optional)

cox_formula_adj

Adjusted Cox formula (optional)

analysis

Analysis label (e.g., "FS", "FSlg")

fs_full

Full forestsearch result object (for df.est access)

verbose

Logical. Print extraction details. Default: FALSE

Value

data.table with extracted estimates including AHR metrics

Extract Subgroup Definition from ForestSearch Object

Description

Internal helper to extract human-readable subgroup definition.

Usage

extract_fs_subgroup_definition(fs.est, verbose = FALSE)

Arguments

fs.est

A forestsearch object.

verbose

Logical. Print diagnostic messages.

Value

Character string describing the subgroup definition.

Extract Estimates from GRF Results

Description

Aligned with new DGM output structure including AHR metrics. Correctly handles grf.subg.harm.survival() output structure:

sg.harm.id: subgroup definition string
data: data frame with treat.recommend column (0 = harm, 1 = complement)

Usage

extract_grf_estimates(
  df,
  grf_est,
  dgm,
  cox_formula = NULL,
  cox_formula_adj = NULL,
  analysis = "GRF",
  frac_tau = 1,
  verbose = FALSE,
  debug = FALSE
)

Arguments

df

Simulated data frame

grf_est

GRF estimation result from grf.subg.harm.survival()

dgm

DGM object

cox_formula

Cox formula

cox_formula_adj

Adjusted Cox formula

analysis

Analysis label

frac_tau

Fraction of tau used

verbose

Print extraction details

debug

Print detailed debugging information about GRF result structure

Value

data.table with extracted estimates

Extract redundancy flag for subgroup combinations

Description

Checks if adding each factor to a subgroup reduces the sample size by at least rmin.

Usage

extract_idx_flagredundancy(x, rmin)

Arguments

x

Matrix of subgroup factor indicators.

rmin

Integer. Minimum required reduction in sample size.

Value

List with id.x (membership vector) and flag.redundant (logical).

Extract cuts from selected tree only

Description

Extracts cut information only from the tree at the specified selected depth. This provides a focused set of cuts from the tree that identified the subgroup meeting the dmin.grf criterion, rather than cuts from all trees.

Usage

extract_selected_tree_cuts(trees, selected_depth, maxdepth)

Arguments

trees

List. Policy trees (indexed by depth)

selected_depth

Integer. Depth of the selected tree (from best_subgroup$depth)

maxdepth

Integer. Maximum tree depth (for populating tree-specific slots)

Details

This function is used when return_selected_cuts_only = TRUE in grf.subg.harm.survival(). It returns:

tree1, tree2, tree3: Individual tree cuts (still populated for reference)
names1, names2, names3: Individual tree variable names
all: Cuts from the SELECTED tree only (not union of all trees)
all_names: Variable names from the SELECTED tree only
selected_depth: The depth that was selected

Value

List with cuts and names, structured similarly to extract_all_tree_cuts but with only the selected tree's cuts in the all field

Extract Subgroup Information

Description

Extracts subgroup definition and membership from results.

Usage

extract_subgroup(df, top_result, index.Z, names.Z, confs_labels)

Arguments

df

Data.frame. Original analysis data.

top_result

Data.table row. Top subgroup result.

index.Z

Matrix. Factor indicators for all subgroups.

names.Z

Character vector. Factor column names.

confs_labels

Character vector. Human-readable labels.

Value

List with sg.harm, sg.harm_label, df_flag, sg.harm.id.

Extract cut information from a policy tree

Description

Extracts all split points and variables from a policy tree

Usage

extract_tree_cuts(tree)

Arguments

tree

Policy tree object

Value

List with cuts (expressions) and names (unique variables)

Generate Figure Note for Quarto/RMarkdown

Description

Formats the figure note from plot_sg_weighted_km() output for use in Quarto or RMarkdown documents.

Usage

figure_note(
  x,
  prefix = "*Note*: ",
  include_definition = TRUE,
  include_hr_explanation = TRUE,
  custom_text = NULL
)

Arguments

x

Output from plot_sg_weighted_km()

prefix

Character. Prefix for the note. Default uses italic Note.

include_definition

Logical. Include subgroup definition. Default: TRUE

include_hr_explanation

Logical. Include HR(bc) explanation. Default: TRUE

custom_text

Character. Additional custom text to append. Default: NULL

Value

Character string formatted as a figure note, or NULL if no content

Filter a vector by LASSO-selected variables

Description

Returns elements of x that are in lassokeep.

Usage

filter_by_lassokeep(x, lassokeep)

Arguments

x

Character vector.

lassokeep

Character vector of selected variables.

Value

Filtered character vector or NULL.

Filter and merge arguments for function calls

Description

Simplifies the common pattern of filtering arguments from a source list to match a target function's formal parameters, then adding/overriding specific arguments.

Usage

filter_call_args(source_args, target_func, override_args = NULL)

Arguments

source_args

List of all arguments (typically from mget() or a stored args list).

target_func

Function whose formals define the filter criteria.

override_args

List of arguments to add or override (optional).

Details

This function:

Extracts formal parameter names from target_func
Keeps only arguments from source_args that match those names
Adds or overrides with any override_args provided

Reduces boilerplate and improves readability across the codebase.

Value

List of filtered arguments ready for do.call().

Find Covariate Any Match

Description

Helper function to determine if any CV fold found a subgroup involving the same covariate (not necessarily same cut).

Usage

find_covariate_any_match(sg_target, sg1, sg2, confs)

Arguments

sg_target

Character. Target subgroup definition to match.

sg1

Character vector. Subgroup 1 labels for each fold.

sg2

Character vector. Subgroup 2 labels for each fold.

confs

Character vector. Confounder names.

Value

Numeric vector (0/1) indicating match for each fold.

Find k_inter Value to Achieve Target Harm Subgroup Hazard Ratio

Description

Uses numerical root-finding to determine the interaction parameter (k_inter) that achieves a specified target hazard ratio in the harm subgroup. This is the most efficient method for single target calibration.

Usage

find_k_inter_for_target_hr(
  target_hr_harm,
  data,
  continuous_vars,
  factor_vars,
  outcome_var,
  event_var,
  treatment_var,
  subgroup_vars,
  subgroup_cuts,
  k_treat = 1,
  k_inter_range = c(-10, 10),
  tol = 0.001,
  n_super = 5000,
  verbose = TRUE
)

Arguments

target_hr_harm

Numeric value specifying the target hazard ratio for the harm subgroup. Must be positive.

data

A data.frame containing the dataset to use for model fitting.

continuous_vars

Character vector of continuous variable names to be standardized and included as covariates.

factor_vars

Character vector of factor/categorical variable names to be converted to dummy variables.

outcome_var

Character string specifying the name of the outcome/time variable.

event_var

Character string specifying the name of the event/status variable (1 = event, 0 = censored).

treatment_var

Character string specifying the name of the treatment variable.

subgroup_vars

Character vector of variable names defining the subgroup.

subgroup_cuts

Named list of cutpoint specifications for subgroup variables. See generate_aft_dgm_flex for details on flexible specifications.

k_treat

Numeric value for treatment effect modifier. Default is 1 (no modification).

k_inter_range

Numeric vector of length 2 specifying the search range for k_inter. Default is c(-10, 10).

tol

Numeric value specifying tolerance for root finding convergence. Default is 0.001.

n_super

Integer specifying size of super population for hazard ratio calculation. Default is 5000.

verbose

Logical indicating whether to print progress information. Default is TRUE.

Details

This function uses the uniroot algorithm to solve the equation:

HR_{harm}(k_{inter}) - HR_{target} = 0

The algorithm typically converges within 5-10 iterations and achieves high precision (within the specified tolerance). If the root-finding fails, the function evaluates the boundaries and provides diagnostic information.

Value

A list of class "k_inter_result" containing:

k_inter: Numeric value of optimal k_inter parameter
achieved_hr_harm: Numeric value of achieved hazard ratio in harm subgroup
target_hr_harm: Numeric value of target hazard ratio (for reference)
error: Numeric value of absolute error between achieved and target HR
dgm: Object of class "aft_dgm_flex" containing the final DGM
convergence: Integer number of iterations to convergence
method: Character string "root-finding" indicating method used

Find the split that leads to a specific leaf node

Description

Identifies the split point that creates a given leaf node

Usage

find_leaf_split(tree, leaf_node)

Arguments

tree

Policy tree object

leaf_node

Integer. Leaf node identifier

Value

Character string with split expression or NULL

Find Quantile for Target Subgroup Proportion

Description

Determines the quantile cutpoint that achieves a target proportion of observations in a subgroup. Useful for calibrating subgroup sizes.

Usage

find_quantile_for_proportion(
  data,
  var_name,
  target_prop,
  direction = "less",
  tol = 1e-04
)

Arguments

data

A data.frame containing the variable of interest

var_name

Character string specifying the variable name to analyze

target_prop

Numeric value between 0 and 1 specifying the target proportion of observations to be included in the subgroup

direction

Character string: "less" for values <= cutpoint (default), "greater" for values > cutpoint

tol

Numeric tolerance for root finding algorithm. Default is 0.0001

Details

This function uses root finding (uniroot) to determine the quantile that results in exactly the target proportion of observations being classified into the subgroup. This is particularly useful when you want to ensure a specific subgroup size regardless of the data distribution.

Value

A list containing:

quantile: The quantile value (between 0 and 1) that achieves the target proportion
cutpoint: The actual data value corresponding to this quantile
actual_proportion: The achieved proportion (should equal target_prop within tolerance)

Find Minimum Sample Size for Target Detection Power

Description

Determines the minimum subgroup sample size needed to achieve a target detection probability for a given true hazard ratio.

Usage

find_required_sample_size(
  theta,
  target_power = 0.8,
  prop_cens = 0.3,
  hr_threshold = 1.25,
  hr_consistency = 1,
  n_range = c(20L, 500L),
  tol = 1,
  verbose = TRUE
)

Arguments

theta

Numeric. True hazard ratio in subgroup.

target_power

Numeric. Target detection probability (0-1). Default: 0.80

prop_cens

Numeric. Proportion censored. Default: 0.3

hr_threshold

Numeric. HR threshold. Default: 1.25

hr_consistency

Numeric. HR consistency threshold. Default: 1.0

n_range

Integer vector of length 2. Range of sample sizes to search. Default: c(20, 500)

tol

Numeric. Tolerance for bisection search. Default: 1

verbose

Logical. Print progress. Default: TRUE

Value

A list with:

n_sg_required

Minimum sample size (rounded up)

achieved_power

Actual detection probability at n_sg_required

theta

Input hazard ratio

target_power

fit_causal_forest(X, Y, W, D, tau.rmst, RCT, seedit)

Arguments

X

Matrix. Covariate matrix

Y

Numeric vector. Outcome variable

W

Numeric vector. Treatment indicator

D

Numeric vector. Event indicator

tau.rmst

Numeric. Time horizon for RMST

RCT

Logical. Is this RCT data?

seedit

Integer. Random seed

Value

Causal survival forest object

Fit Cox Model for Subgroup

Description

Fit Cox Model for Subgroup

Usage

fit_cox_for_subgroup(yy, dd, tt, id.x)

Fit Cox Models for Subgroups

Description

Fits Cox models for two subgroups defined by treatment recommendation.

Usage

fit_cox_models(df, formula)

Arguments

df

Data frame.

formula

Cox model formula.

Value

List with HR and SE for each subgroup.

Fit policy trees up to specified depth

Description

Fits policy trees of depths 1 through maxdepth and computes metrics

Usage

fit_policy_trees(X, data, dr.scores, maxdepth, n.min)

Arguments

X

Matrix. Covariate matrix

data

Data frame. Original data

dr.scores

Matrix. Doubly robust scores

maxdepth

Integer. Maximum tree depth (1-3)

n.min

Integer. Minimum subgroup size

Value

List with trees and combined values

ForestSearch: Exploratory Subgroup Identification

Description

Identifies subgroups with differential treatment effects in clinical trials using a combination of Generalized Random Forests (GRF), LASSO variable selection, and exhaustive combinatorial search with split-sample validation.

Usage

forestsearch(
  df.analysis,
  outcome.name = "tte",
  event.name = "event",
  treat.name = "treat",
  id.name = "id",
  potentialOutcome.name = NULL,
  flag_harm.name = NULL,
  confounders.name = NULL,
  parallel_args = list(plan = "callr", workers = 6, show_message = TRUE),
  df.predict = NULL,
  df.test = NULL,
  is.RCT = TRUE,
  seedit = 8316951,
  est.scale = "hr",
  use_lasso = TRUE,
  use_grf = TRUE,
  grf_res = NULL,
  grf_cuts = NULL,
  max_n_confounders = 1000,
  grf_depth = 2,
  dmin.grf = 12,
  frac.tau = 0.6,
  return_selected_cuts_only = TRUE,
  conf_force = NULL,
  defaultcut_names = NULL,
  cut_type = "default",
  exclude_cuts = NULL,
  replace_med_grf = FALSE,
  cont.cutoff = 4,
  conf.cont_medians = NULL,
  conf.cont_medians_force = NULL,
  n.min = 60,
  hr.threshold = 1.25,
  hr.consistency = 1,
  sg_focus = "hr",
  fs.splits = 1000,
  m1.threshold = Inf,
  pconsistency.threshold = 0.9,
  stop_threshold = 0.95,
  showten_subgroups = FALSE,
  d0.min = 12,
  d1.min = 12,
  max.minutes = 3,
  minp = 0.025,
  details = FALSE,
  maxk = 2,
  by.risk = 12,
  plot.sg = FALSE,
  plot.grf = FALSE,
  max_subgroups_search = 10,
  vi.grf.min = -0.2,
  use_twostage = TRUE,
  twostage_args = list()
)

Arguments

df.analysis

Data frame. Analysis dataset with required columns.

outcome.name

Character. Name of time-to-event outcome variable. Default "tte".

event.name

Character. Name of event indicator (1=event, 0=censored). Default "event".

treat.name

Character. Name of treatment variable (1=treatment, 0=control). Default "treat".

id.name

Character. Name of subject ID variable. Default "id".

potentialOutcome.name

Character. Name of potential outcome variable (optional).

flag_harm.name

Character. Name of true harm flag for simulations (optional).

confounders.name

Character vector. Names of candidate subgroup-defining variables.

parallel_args

List. Parallel processing configuration:

plan: Character. One of "multisession", "multicore", "callr", "sequential"
workers: Integer. Number of parallel workers
show_message: Logical. Show parallel setup messages

df.predict

Data frame. Prediction dataset (optional).

df.test

Data frame. Test dataset (optional).

is.RCT

Logical. Is this a randomized controlled trial? Default TRUE.

seedit

Integer. Random seed. Default 8316951.

est.scale

Character. Estimation scale ("hr" or "rmst"). Default "hr".

use_lasso

Logical. Use LASSO for variable selection. Default TRUE.

use_grf

Logical. Use GRF for variable importance. Default TRUE.

grf_res

GRF results object (optional, for reuse).

grf_cuts

List. Custom GRF cut points (optional).

max_n_confounders

Integer. Maximum confounders to consider. Default 1000.

grf_depth

Integer. GRF tree depth. Default 2.

dmin.grf

Integer. Minimum events for GRF. Default 12.

frac.tau

Numeric. Fraction of tau for RMST. Default 0.6.

return_selected_cuts_only

Logical. If TRUE (default), GRF returns only cuts from the tree depth that identified the selected subgroup meeting dmin.grf. If FALSE returns all cuts from all fitted trees (depths 1 through grf_depth). See grf.subg.harm.survival for details.

conf_force

Character vector. Variables to force include (optional).

defaultcut_names

Character vector. Default cut variable names (optional).

cut_type

Character. Cut type ("default" or "custom"). Default "default".

exclude_cuts

Character vector. Variables to exclude from cutting (optional).

replace_med_grf

Logical. Replace median with GRF cuts. Default FALSE.

cont.cutoff

Integer. Cutoff for continuous vs categorical. Default 4.

conf.cont_medians

Named numeric vector. Median values for continuous variables (optional).

conf.cont_medians_force

Named numeric vector. Forced median values (optional).

n.min

Integer. Minimum subgroup size. Default 60.

hr.threshold

Numeric. Minimum HR for candidate subgroups. Default 1.25.

hr.consistency

Numeric. Minimum HR for consistency validation. Default 1.0.

sg_focus

Character. Subgroup selection focus. One of "hr", "hrMaxSG", "maxSG", "hrMinSG", "minSG". Default "hr".

fs.splits

Integer. Number of splits for consistency evaluation (or maximum splits when use_twostage = TRUE). Default 1000.

m1.threshold

Numeric. Maximum median survival threshold. Default Inf.

pconsistency.threshold

Numeric. Minimum consistency proportion. Default 0.90.

stop_threshold

Numeric in [0, 1] or NULL. Early stopping threshold for consistency evaluation. When a candidate subgroup's consistency probability (Pcons) meets or exceeds this threshold, evaluation stops early — remaining candidates are skipped. Set to NULL to disable early stopping and force evaluation of all candidates up to max_subgroups_search. Default 0.95.

Note: Values > 1.0 are not permitted. To disable early stopping, use stop_threshold = NULL, not a value above 1.

Automatically reset to NULL (with a warning) when sg_focus is "hrMaxSG" or "hrMinSG", as these compound criteria require comparing HR and size across all candidates.

showten_subgroups

Logical. Show top 10 subgroups. Default FALSE.

d0.min

Integer. Minimum control arm events. Default 12.

d1.min

Integer. Minimum treatment arm events. Default 12.

max.minutes

Numeric. Maximum search time in minutes. Default 3.

minp

Numeric. Minimum prevalence threshold. Default 0.025.

details

Logical. Print progress details. Default FALSE.

maxk

Integer. Maximum number of factors per subgroup. Default 2.

by.risk

Integer. Risk table interval. Default 12.

plot.sg

Logical. Plot subgroup survival curves. Default FALSE.

plot.grf

Logical. Plot GRF results. Default FALSE.

max_subgroups_search

Integer. Maximum subgroups to evaluate. Default 10.

vi.grf.min

Numeric. Minimum GRF variable importance. Default -0.2.

use_twostage

Logical. Use two-stage sequential consistency algorithm for improved performance. Default FALSE for backward compatibility. When TRUE, fs.splits becomes the maximum number of splits, and early stopping is enabled. See Details.

twostage_args

List. Parameters for two-stage algorithm (only used when use_twostage = TRUE):

n.splits.screen: Integer. Splits for Stage 1 screening. Default 30.
screen.threshold: Numeric. Consistency threshold for Stage 1. Default is automatically calculated to provide ~2.5 SE margin.
batch.size: Integer. Splits per batch in Stage 2. Default 20.
conf.level: Numeric. Confidence level for early stopping. Default 0.95.
min.valid.screen: Integer. Minimum valid Stage 1 splits. Default 10.

Details

Algorithm Overview:

Variable Selection: GRF identifies variables with treatment effect heterogeneity; LASSO selects most predictive
Subgroup Discovery: Exhaustive search over factor combinations up to maxk
Consistency Validation: Split-sample validation ensures reproducibility
Selection: Choose subgroup based on sg_focus criterion

Two-Stage Consistency Algorithm: When use_twostage = TRUE, the consistency evaluation uses an optimized algorithm that can provide 3-10x speedup:

Stage 1: Quick screening with n.splits.screen splits eliminates clearly non-viable candidates
Stage 2: Sequential batched evaluation with early stopping for candidates passing Stage 1

The two-stage algorithm is recommended for:

Exploratory analyses with many candidate subgroups
Large fs.splits values (>200)
Iterative model development

For final regulatory submissions, use_twostage = FALSE may be preferred for exact reproducibility.

Value

A list of class "forestsearch" containing:

grp.consistency

Consistency evaluation results including:

out_sg: Selected subgroup based on sg_focus
sg_focus: Focus criterion used
df_flag: Treatment recommendations
algorithm: "twostage" or "fixed"
n_candidates_evaluated: Number evaluated
n_passed: Number passing threshold

find.grps

Subgroup search results

confounders.candidate

Candidate confounders considered

confounders.evaluated

Confounders after variable selection

df.est

Analysis data with treatment recommendations

df.predict

Prediction data with recommendations (if provided)

df.test

Test data with recommendations (if provided)

minutes_all

Total computation time

grf_res

GRF results object

sg_focus

Subgroup focus criterion used

sg.harm

Selected subgroup definition

grf_cuts

GRF cut points used

prop_maxk

Proportion of max combinations searched

max_sg_est

Maximum subgroup HR estimate

grf_plot

GRF plot object (if plot.grf = TRUE)

args_call_all

All arguments for reproducibility

References

FDA Guidance for Industry: Enrichment Strategies for Clinical Trials
Athey & Imbens (2016). Recursive partitioning for heterogeneous causal effects. PNAS.
Wager & Athey (2018). Estimation and inference of heterogeneous treatment effects using random forests. JASA.

ForestSearch K-Fold Cross-Validation

Description

This function assesses the stability and reproducibility of ForestSearch subgroup identification through cross-validation. For each fold:

Train ForestSearch on (K-1) folds
Apply the identified subgroup to the held-out fold
Compare predictions to the original full-data analysis

Usage

forestsearch_Kfold(
  fs.est,
  Kfolds = nrow(fs.est$df.est),
  seedit = 8316951L,
  parallel_args = list(plan = "multisession", workers = 6, show_message = TRUE),
  sg0.name = "Not recommend",
  sg1.name = "Recommend",
  details = FALSE
)

Arguments

fs.est

List. ForestSearch results object from forestsearch. Must contain df.est (data frame) and args_call_all (list of arguments).

Kfolds

Integer. Number of folds (default: nrow(fs.est$df.est) for LOO).

seedit

Integer. Random seed for fold assignment (default: 8316951).

parallel_args

List. Parallelization configuration with elements:

plan: Character. One of "multisession", "multicore", "sequential"
workers: Integer. Number of parallel workers
show_message: Logical. Show parallel setup messages

sg0.name

Character. Label for subgroup 0 (default: "Not recommend").

sg1.name

Character. Label for subgroup 1 (default: "Recommend").

details

Logical. Print progress details (default: FALSE).

Details

Performs K-fold cross-validation for ForestSearch, evaluating subgroup identification and agreement between training and test sets.

Value

List with components:

resCV: Data frame with CV predictions for each observation
cv_args: Arguments used for CV ForestSearch calls
timing_minutes: Execution time in minutes
prop_SG_found: Percentage of folds where a subgroup was found
sg_analysis: Original subgroup definition from full-data analysis
sg0.name, sg1.name: Subgroup labels
Kfolds: Number of folds used
sens_summary: Named vector of sensitivity metrics (sens_H, sens_Hc, ppv_H, ppv_Hc)
find_summary: Named vector of subgroup-finding metrics (Any, Exact, etc.)

Cross-Validation Types

Leave-One-Out (LOO): When Kfolds = nrow(df), each observation is held out once. Most thorough but computationally intensive.
K-Fold: When Kfolds < nrow(df), data is split into K roughly equal folds. Good balance of bias-variance tradeoff.

Output Metrics

The returned resCV data frame contains:

treat.recommend: Prediction from CV model
treat.recommend.original: Prediction from full-data model
cvindex: Fold assignment
sg1, sg2: Subgroup definitions found in each fold

ForestSearch K-Fold Cross-Validation Output Summary

Description

Summarizes cross-validation results for ForestSearch, including subgroup agreement and performance metrics.

Usage

forestsearch_KfoldOut(res, details = FALSE, outall = FALSE)

Arguments

res

List. Result object from ForestSearch cross-validation, must contain elements: cv_args, sg_analysis, sg0.name, sg1.name, Kfolds, resCV.

details

Logical. Print details during execution (default: FALSE).

outall

Logical. If TRUE, returns all summary tables; if FALSE, returns only metrics (default: FALSE).

Value

If outall=FALSE, a list with sens_metrics_original and find_metrics. If outall=TRUE, a list with summary tables and metrics.

ForestSearch Bootstrap with doFuture Parallelization

Description

Orchestrates bootstrap analysis for ForestSearch using doFuture parallelization. Implements bias correction methods to adjust for optimism in subgroup selection.

Usage

forestsearch_bootstrap_dofuture(
  fs.est,
  nb_boots,
  seed = 8316951L,
  details = FALSE,
  show_three = FALSE,
  parallel_args = list()
)

Arguments

fs.est

List. ForestSearch results object from forestsearch. Must contain df.est (data frame) and args_call_all (list of arguments).

nb_boots

Integer. Number of bootstrap samples (recommend 500-1000).

seed

Integer. Random seed for reproducibility of bootstrap sample generation. Default 8316951L. The value is passed to both bootstrap_ystar (which constructs the n \times B bootstrap index matrix) and bootstrap_results (which re-runs the ForestSearch algorithm on each replicate); both calls must use the same seed to ensure bootstrap index alignment. Set to NULL for a non-reproducible run.

details

Logical. If TRUE, prints detailed progress information. Default: FALSE.

show_three

Logical. If TRUE, shows verbose output for first 3 bootstrap iterations for debugging. Default: FALSE.

parallel_args

List. Parallelization configuration with elements:

plan: Character. One of "multisession", "multicore", "callr", or "sequential"
workers: Integer. Number of parallel workers
show_message: Logical. Show parallel setup messages

If empty list, inherits settings from original forestsearch call.

Value

List with the following components:

results: Data.table with bias-corrected estimates for each bootstrap iteration
SG_CIs: List of confidence intervals for H and Hc (raw and bias-corrected)
FSsg_tab: Formatted table of subgroup estimates
Ystar_mat: Matrix (nb_boots x n) of bootstrap sample indicators
H_estimates: Detailed estimates for subgroup H
Hc_estimates: Detailed estimates for subgroup Hc
summary: (If create_summary=TRUE) Enhanced summary with tables and diagnostics

Bias Correction Methods

Two bias correction approaches are implemented:

Method 1 (Simple Optimism):

H_{adj1} = H_{obs} - (H^*_{*} - H^*_{obs})

where H^*_{*} is the new subgroup HR on bootstrap data and H^*_{obs} is the new subgroup HR on original data.
Method 2 (Double Bootstrap):

H_{adj2} = 2 \times H_{obs} - (H_{*} + H^*_{*} - H^*_{obs})

where H_{*} is the original subgroup HR on bootstrap data.

Variable Naming Convention

H: Original subgroup (harm/questionable, treat.recommend == 0)
Hc: Complement subgroup (recommend, treat.recommend == 1)
_obs: Estimate from original data
_star: Estimate from bootstrap data
_biasadj_1: Bias correction method 1
_biasadj_2: Bias correction method 2

Performance

Typical runtime: 1-5 seconds per bootstrap iteration. For 1000 bootstraps with 6 workers, expect 3-10 minutes total. Memory usage scales with dataset size and number of workers.

Requirements

Original fs.est must have identified a valid subgroup
Requires packages: data.table, foreach, doFuture, survival
For plots: requires ggplot2

ForestSearch Repeated K-Fold Cross-Validation

Description

This function performs multiple independent K-fold cross-validations to assess the variability in subgroup identification. Each simulation:

Randomly shuffles the data
Performs K-fold CV
Records sensitivity and agreement metrics

Results are summarized across all simulations.

Usage

forestsearch_tenfold(
  fs.est,
  sims,
  Kfolds = 10,
  details = TRUE,
  seed = 8316951L,
  parallel_args = list(plan = "multisession", workers = 6, show_message = TRUE)
)

Arguments

fs.est

List. ForestSearch results object from forestsearch.

sims

Integer. Number of simulation repetitions.

Kfolds

Integer. Number of folds per simulation (default: 10).

details

Logical. Print progress details (default: TRUE).

seed

Integer. Base random seed for fold shuffling. Default 8316951L. Each simulation uses seed + 1000 * ksim for reproducibility.

parallel_args

List. Parallelization configuration.

Details

Runs repeated K-fold cross-validation simulations for ForestSearch and summarizes subgroup identification stability across repetitions.

Value

List with components:

sens_summary: Named vector of median sensitivity metrics across simulations
find_summary: Named vector of median subgroup-finding metrics
sens_out: Matrix of sensitivity metrics (sims x metrics)
find_out: Matrix of finding metrics (sims x metrics)
timing_minutes: Total execution time
sims: Number of simulations run
Kfolds: Number of folds per simulation

Parallelization Strategy

Unlike the single K-fold function which parallelizes across folds, this function parallelizes across simulations for better efficiency when running many repetitions. Each simulation runs its K-fold CV sequentially.

Format Confidence Interval for Estimates

Description

Formats confidence interval for estimates.

Usage

format_CI(estimates, col_names)

Arguments

estimates

Data frame or data.table of estimates.

col_names

Character vector of column names for estimate, lower, upper.

Value

Character string formatted as \"estimate (lower, upper)\".

Format Bootstrap Diagnostics Table with gt

Description

Creates a publication-ready diagnostics table from bootstrap results.

Usage

format_bootstrap_diagnostics_table(
  diagnostics,
  nb_boots,
  results,
  H_estimates = NULL,
  Hc_estimates = NULL
)

Arguments

diagnostics

List. Diagnostics information from summarize_bootstrap_results()

nb_boots

Integer. Number of bootstrap iterations

results

Data.table. Bootstrap results with bias-corrected estimates

H_estimates

List. H subgroup estimates

Hc_estimates

List. Hc subgroup estimates

Value

A gt table object

Format Bootstrap Results Table with gt

Description

Creates a publication-ready table from ForestSearch bootstrap results, with bias-corrected confidence intervals, informative formatting, and optional subgroup definition footnote.

Usage

format_bootstrap_table(
  FSsg_tab,
  nb_boots,
  est.scale = "hr",
  boot_success_rate = NULL,
  sg_definition = NULL,
  title = NULL,
  subtitle = NULL
)

Arguments

FSsg_tab

Data frame or matrix from forestsearch_bootstrap_dofuture()$FSsg_tab

nb_boots

Integer. Number of bootstrap iterations performed

est.scale

Character. "hr" or "1/hr" for effect scale

boot_success_rate

Numeric. Proportion of bootstraps that found subgroups

sg_definition

Character. Subgroup definition string to display as footnote (e.g., "{age>=50} & {nodes>=3}"). If NULL, no subgroup footnote is added.

title

Character. Custom title (optional)

subtitle

Character. Custom subtitle (optional)

Value

A gt table object

Format Bootstrap Timing Table with gt

Description

Creates a publication-ready timing summary table from bootstrap results.

Usage

format_bootstrap_timing_table(timing_list, nb_boots, boot_success_rate)

Arguments

timing_list

List. Timing information from summarize_bootstrap_results()$timing

nb_boots

Integer. Number of bootstrap iterations

boot_success_rate

Numeric. Proportion of successful bootstraps

Value

A gt table object

Format Continuous Variable Definition for Display

Description

Format Continuous Variable Definition for Display

Usage

format_continuous_definition(var_data, cut_spec, var_name)

Format ForestSearch Details Output for Beamer Two-Column Display

Description

Captures forestsearch(details = TRUE) console output and splits it into two columns for readable beamer slides. Left column shows variable selection (GRF, LASSO, candidate factors); right column shows subgroup search, consistency evaluation, and results.

Usage

format_fs_details(
  fs_output,
  split_after = "Candidate factors",
  fontsize = "scriptsize",
  col_widths = c(0.48, 0.52),
  max_width = 48
)

Arguments

fs_output

Character vector of captured output lines from capture.output(forestsearch(..., details = TRUE)).

split_after

Character string (regex). The output is split after the block matching this pattern. Default: "Candidate factors" which splits after the candidate factor list.

fontsize

Character. LaTeX font size for the output text. One of "tiny", "scriptsize", "footnotesize", "small". Default: "scriptsize".

col_widths

Numeric vector of length 2. Column widths as fractions of \textwidth. Default: c(0.48, 0.52).

max_width

Integer. Maximum character width per line before wrapping. Long lines are wrapped at comma or space boundaries with a 4-space continuation indent. Default: 48 (suitable for half-slide columns at scriptsize).

Value

Invisibly returns a list with left and right character vectors. Side effect: emits LaTeX via cat() for use in a chunk with results='asis'.

Quarto Setup

No special LaTeX packages required. Works in any beamer frame without the fragile option.

Usage

In a Quarto beamer chunk with results='asis' and echo=FALSE, first capture the forestsearch output with capture.output(), then call format_fs_details(fs_output).

Format Operating Characteristics Results as GT Table

Description

Creates a formatted gt table from simulation operating characteristics results.

Usage

format_oc_results(
  results,
  analyses = NULL,
  metrics = "all",
  digits = 3,
  digits_hr = 3,
  title = "Operating Characteristics Summary",
  subtitle = NULL,
  use_gt = TRUE
)

Arguments

results

data.table or data.frame. Simulation results from run_simulation_analysis or combined results from multiple simulations.

analyses

Character vector. Analysis methods to include. Default: NULL (all analyses in results)

metrics

Character vector. Metrics to display. Options include: "detection", "classification", "hr_estimates", "ahr_estimates", "cde_estimates", "subgroup_size", "all". Default: "all"

digits

Integer. Decimal places for proportions. Default: 3

digits_hr

Integer. Decimal places for hazard ratios. Default: 3

title

Character. Table title. Default: "Operating Characteristics Summary"

subtitle

Character. Table subtitle. Default: NULL

use_gt

Logical. Return gt table if TRUE, data.frame if FALSE. Default: TRUE

Details

The function summarizes simulation results across multiple metrics:

Found: Proportion of simulations finding a subgroup (any.H)
Classification: Sen, spec, PPV, NPV
HR Estimates: Mean Cox hazard ratios in true (H) and identified (H-hat) subgroups and their complements
AHR Estimates: Mean average hazard ratios (from loghr_po) in true and identified subgroups
CDE Estimates: Controlled direct effects (from theta_0/theta_1) in true and identified subgroups
Subgroup Size: Average, min, max sizes

Column notation aligns with build_estimation_table and Leon et al. (2024): H = true (oracle) subgroup, H-hat = identified subgroup. The asterisk (*) is reserved for bootstrap bias-corrected estimates and is not used in this summary table.

Value

A gt table object (if use_gt = TRUE and gt package available) or data.frame

Format results for subgroup summary

Description

Formats results for subgroup summary table.

Usage

format_results(
  subgroup_name,
  n,
  n_treat,
  d,
  m1,
  m0,
  drmst,
  hr,
  hr_a = NA,
  hr_po = NA,
  return_medians = TRUE
)

Arguments

subgroup_name

Character. Subgroup name.

n

Character. Sample size.

n_treat

Character. Treated count.

d

Character. Event count.

m1

Numeric. Median or RMST for treatment.

m0

Numeric. Median or RMST for control.

drmst

Numeric. RMST difference.

hr

Character. Hazard ratio (formatted).

hr_a

Character. Adjusted hazard ratio (optional).

hr_po

Numeric. Potential outcome hazard ratio (optional).

return_medians

Logical. Use medians or RMST.

Value

Character vector of results.

Format Search Results

Description

Format Search Results

Usage

format_search_results(
  results_list,
  Z,
  details,
  t.sofar,
  L,
  max_count,
  filter_counts = NULL
)

Arguments

results_list

List of result rows

Z

Matrix of factor indicators

details

Logical. Print details

t.sofar

Numeric. Time elapsed

L

Integer. Number of factors

max_count

Integer. Maximum combinations

filter_counts

List. Counts at each filtering stage (optional)

Format Subgroup Summary Tables with gt

Description

Creates publication-ready gt tables for bootstrap subgroup analysis

Usage

format_subgroup_summary_tables(subgroup_summary, nb_boots)

Arguments

subgroup_summary

List from summarize_bootstrap_subgroups()

nb_boots

Integer. Number of bootstrap iterations

Value

List of gt table objects

Generate Synthetic Survival Data using AFT Model with Flexible Subgroups

Description

Creates a data generating mechanism (DGM) for survival data using an Accelerated Failure Time (AFT) model with Weibull distribution. Supports flexible subgroup definitions and treatment-subgroup interactions.

Usage

generate_aft_dgm_flex(
  data,
  continuous_vars,
  factor_vars,
  continuous_vars_cens = NULL,
  factor_vars_cens = NULL,
  set_beta_spec = list(set_var = NULL, beta_var = NULL),
  outcome_var,
  event_var,
  treatment_var = NULL,
  subgroup_vars = NULL,
  subgroup_cuts = NULL,
  draw_treatment = FALSE,
  model = "alt",
  k_treat = 1,
  k_inter = 1,
  n_super = 5000,
  select_censoring = TRUE,
  cens_type = "weibull",
  cens_params = list(),
  seed = 8316951,
  verbose = TRUE,
  standardize = FALSE,
  spline_spec = NULL
)

Arguments

data

A data.frame containing the input dataset to base the simulation on

continuous_vars

Character vector of continuous variable names to be standardized and included as covariates

factor_vars

Character vector of factor/categorical variable names to be converted to dummy variables (largest value as reference)

continuous_vars_cens

Character vector of continuous variable names to be used for censoring model. If NULL, uses same as continuous_vars. Default NULL

factor_vars_cens

Character vector of factor variable names to be used for censoring model. If NULL, uses same as factor_vars. Default NULL

set_beta_spec

List with elements 'set_var' and 'beta_var' for manually setting specific beta coefficients. Default list(set_var = NULL, beta_var = NULL)

outcome_var

Character string specifying the name of the outcome/time variable

event_var

Character string specifying the name of the event/status variable (1 = event, 0 = censored)

treatment_var

Character string specifying the name of the treatment variable. If NULL, treatment will be randomly simulated with 50/50 allocation

subgroup_vars

Character vector of variable names defining the subgroup. Default is NULL (no subgroups)

subgroup_cuts

Named list of cutpoint specifications for subgroup variables. See Details section for flexible specification options

draw_treatment

Logical indicating whether to redraw treatment assignment in simulation. Default is FALSE (use original assignments)

model

Character string: "alt" for alternative model with subgroup effects, "null" for null model without subgroup effects. Default is "alt"

k_treat

Numeric treatment effect modifier. Values >1 increase treatment effect, <1 decrease it. Default is 1 (no modification)

k_inter

Numeric interaction effect modifier for treatment-subgroup interaction. Default is 1 (no modification)

n_super

Integer specifying size of super population to generate. Default is 5000

select_censoring

Logical. If TRUE (default), fits the censoring distribution to the observed censoring times in data using survreg with AIC-based selection among Weibull and log-normal models (with and without covariates). If FALSE, no model is fitted; the censoring distribution is specified entirely by cens_params. Default TRUE.

cens_type

Character string specifying censoring distribution type: "weibull" or "uniform". Controls which parametric family is considered when select_censoring = TRUE, and determines the required structure of cens_params when select_censoring = FALSE. Default "weibull".

cens_params

Named list of censoring distribution parameters. Interpretation depends on select_censoring and cens_type:

select_censoring = TRUE: Ignored; all parameters are estimated from data.
select_censoring = FALSE, cens_type = "uniform": Must supply min and max. If either is absent, defaults to 0.5 * min(y) and 1.5 * max(y) with a message.
select_censoring = FALSE, cens_type = "weibull": Must supply mu (log-scale location) and tau (scale). Optionally supply type ("weibull" or "lognormal"); defaults to "weibull". Censoring is treated as intercept-only (no covariate or treatment dependence): lin_pred_cens_0 = lin_pred_cens_1 = mu.

Default list().

seed

Integer random seed for reproducibility. Default is 8316951

verbose

Logical indicating whether to print diagnostic information during execution. Default is TRUE

standardize

Logical indicating whether to standardize continuous variables. Default is FALSE

spline_spec

List specifying spline configuration for treatment effect. Must include 'var' (variable name), 'knot', 'zeta', and 'log_hrs' (vector of length 3). Default NULL (no spline)

Details

Subgroup Cutpoint Specifications

The subgroup_cuts parameter accepts multiple flexible specifications:

Fixed Value

subgroup_cuts = list(er = 20)  # er <= 20

Quantile-based

subgroup_cuts = list(
  er = list(type = "quantile", value = 0.25)  # er <= 25th percentile
)

Function-based

subgroup_cuts = list(
  er = list(type = "function", fun = median)  # er <= median
)

Range

subgroup_cuts = list(
  age = list(type = "range", min = 40, max = 60)  # 40 <= age <= 60
)

Greater than

subgroup_cuts = list(
  nodes = list(type = "greater", quantile = 0.75)  # nodes > 75th percentile
)

Multiple values (for categorical)

subgroup_cuts = list(
  grade = list(type = "multiple", values = c(2, 3))  # grade in (2, 3)
)

Custom function

subgroup_cuts = list(
  er = list(
    type = "custom",
    fun = function(x) x <= quantile(x, 0.3) | x >= quantile(x, 0.9)
  )
)

Model Structure

The AFT model with Weibull distribution is specified as:

\log(T) = \mu + \gamma' X + \sigma \epsilon

Where:

T is the survival time
\mu is the intercept
\gamma contains the covariate effects
X includes treatment, covariates, and treatment x subgroup interaction
\sigma is the scale parameter
\epsilon follows an extreme value distribution

Interaction Term

The model creates a SINGLE interaction term representing the treatment effect modification when ALL subgroup conditions are simultaneously satisfied. This is not multiple separate interactions but one combined indicator.

Value

A named list of class aft_dgm containing:

data

Simulated trial data frame with outcome, event, and treatment columns.

model_params

Model parameters used for data generation (coefficients, dispersion, spline info if applicable).

subgroup_info

Subgroup definition and membership indicators, if a heterogeneous treatment effect was specified.

censoring_info

Censoring model parameters and observed censoring rate.

call_args

Arguments used in the call, for reproducibility.

Author(s)

Your Name

References

Leon, L.F., et al. (2024). Statistics in Medicine.

Kalbfleisch, J.D. and Prentice, R.L. (2002). The Statistical Analysis of Failure Time Data (2nd ed.). Wiley.

Examples


df <- survival::gbsg
dgm <- generate_aft_dgm_flex(
  data            = df,
  outcome_var     = "rfstime",
  event_var       = "status",
  treatment_var   = "hormon",
  continuous_vars = c("age", "size", "nodes", "pgr", "er"),
  factor_vars     = "meno",
  model           = "null",
  verbose         = FALSE
)
str(dgm)

Generate Synthetic Data using Bootstrap with Perturbation

Description

Generate Synthetic Data using Bootstrap with Perturbation

Usage

generate_bootstrap_synthetic(
  data,
  continuous_vars,
  cat_vars,
  n = NULL,
  seed = 123,
  noise_level = 0.1,
  id_var = NULL,
  cat_flip_prob = NULL,
  preserve_bounds = TRUE,
  ordinal_vars = NULL
)

Arguments

data

Original dataset to bootstrap from

continuous_vars

Character vector of continuous variable names

cat_vars

Character vector of categorical variable names

n

Number of synthetic observations to generate (default: same as original)

seed

Random seed for reproducibility

noise_level

Noise level for perturbation (0 to 1, default 0.1)

id_var

Optional name of ID variable to regenerate (will be numbered 1:n)

cat_flip_prob

Probability of flipping categorical values (default: noise_level/2)

preserve_bounds

Logical: should continuous variables stay within original bounds? (default: TRUE)

ordinal_vars

Optional character vector of ordinal categorical variables (these will be perturbed to adjacent values rather than randomly flipped)

Value

A data frame with synthetic data

Generate Bootstrap Sample with Added Noise

Description

Creates a bootstrap sample from a dataset with controlled noise added to both continuous and categorical variables. This function is useful for generating synthetic datasets that maintain the general structure of the original data while introducing controlled variation.

Usage

generate_bootstrap_with_noise(
  data,
  n = NULL,
  continuous_vars = NULL,
  cat_vars = NULL,
  id_var = "pid",
  seed = 123,
  noise_level = 0.1
)

Arguments

data

A data frame containing the original dataset to bootstrap from.

n

Integer. Number of observations in the output dataset. If NULL (default), uses the same number of rows as the input data.

continuous_vars

Character vector of column names to treat as continuous variables. If NULL (default), automatically detects numeric columns.

cat_vars

Character vector of column names to treat as categorical variables. If NULL (default), automatically detects factors, logical columns, and numeric columns with 10 or fewer unique values.

id_var

Character string specifying the name of the ID variable column. This column will be reset to sequential values (1:n) in the output. Default is "pid".

seed

Integer. Random seed for reproducibility. Default is 123.

noise_level

Numeric between 0 and 1. Controls the amount of noise added. For continuous variables, this is multiplied by the standard deviation to determine noise magnitude. For categorical variables, this is divided by 2 to determine the probability of value changes. Default is 0.1.

Details

The function performs the following operations:

Bootstrap Sampling

Samples n observations with replacement from the original dataset.

Continuous Variable Noise

Adds Gaussian noise with standard deviation = original SD × noise_level
Constrains values to remain within original variable bounds
Preserves integer type for variables that appear to be integers

Categorical Variable Perturbation

Changes values with probability = noise_level / 2
Binary variables: flips to opposite value
Multi-level unordered: randomly selects from other levels
Ordered factors: weights selection toward adjacent levels
Preserves factor levels and ordering from original data

Value

A data frame with the same structure as the input data, containing bootstrap sampled observations with added noise.

Note

The function assumes that categorical variables with numeric encoding should maintain their numeric type unless they are factors in the input
Missing values (NA) are handled appropriately in calculations but are not imputed
For ordered factors or variables named "grade", the perturbation favors transitions to adjacent levels over distant levels

Generate Combination Indices

Description

Creates indices for all factor combinations up to maxk

Usage

generate_combination_indices(L, maxk)

Generate Complement Expression

Description

Creates the logical complement of a subgroup expression. Handles common patterns like "var <= x" -> "var > x".

Usage

generate_complement_expression(expr)

Arguments

expr

Character vector of expressions to negate.

Value

Character string with negated expression.

Generate Detection Probability Curve

Description

Computes detection probability across a range of hazard ratios to create a power-like curve for subgroup detection.

Usage

generate_detection_curve(
  theta_range = c(0.5, 3),
  n_points = 50L,
  n_sg,
  prop_cens = 0.3,
  hr_threshold = 1.25,
  hr_consistency = 1,
  include_reference = TRUE,
  method = "cubature",
  verbose = TRUE
)

Arguments

theta_range

Numeric vector of length 2. Range of HR values to evaluate. Default: c(0.5, 3.0)

n_points

Integer. Number of points to evaluate. Default: 50

n_sg

Integer. Subgroup sample size.

prop_cens

Numeric. Proportion censored (0-1). Default: 0.3

hr_threshold

Numeric. HR threshold for detection. Default: 1.25

hr_consistency

Numeric. HR consistency threshold. Default: 1.0

include_reference

Logical. Include reference HR values (0.5, 0.75, 1.0). Default: TRUE

method

Character. Integration method. Default: "cubature"

verbose

Logical. Print progress. Default: TRUE

Value

A data.frame with columns:

theta

Hazard ratio values

probability

Detection probability

n_sg

Subgroup size (repeated)

prop_cens

Censoring proportion (repeated)

hr_threshold

Detection threshold (repeated)

Generate Synthetic GBSG Data using Generalized Bootstrap

Description

Generate Synthetic GBSG Data using Generalized Bootstrap

Usage

generate_gbsg_bootstrap_general(n = 686, seed = 123, noise_level = 0.1)

Arguments

n

Number of observations

seed

Random seed

noise_level

Noise level for perturbation

Value

Synthetic GBSG dataset

Generate Readable Subgroup Labels from ForestSearch Object

Description

Extracts human-readable subgroup labels that are also valid R expressions for use with plotKM.band_subgroups(). Attempts to extract the actual subgroup definition (e.g., "er <= 0") rather than column references.

Usage

generate_readable_sg_labels(fs.est, verbose = FALSE)

Arguments

fs.est

A forestsearch object.

verbose

Logical. Print diagnostic messages.

Value

Character vector of length 2: c(harm_label, benefit_label)

Generate Super Population and Calculate Linear Predictors

Description

Generate Super Population and Calculate Linear Predictors

Usage

generate_super_population(
  df_work,
  n_super,
  draw_treatment,
  gamma,
  b0,
  mu,
  tau,
  verbose,
  spline_info = NULL
)

Fit Cox Model for Subgroup

Description

Fits a Cox model for a subgroup and returns estimate and standard error.

Usage

get_Cox_sg(df_sg, cox.formula, est.loghr = TRUE, cox_initial = log(1))

Arguments

df_sg

Data frame for subgroup.

cox.formula

Cox model formula.

est.loghr

Logical. Is estimate on log(HR) scale?

cox_initial

Optional pre-fitted Cox model object to use instead of fitting a new model. Default NULL

Details

Function is utilized throughout codebase

Value

List with estimate and standard error.

ForestSearch Data Preparation and Feature Selection

Description

Prepares a dataset for ForestSearch, including options for LASSO-based dimension reduction, GRF cuts, forced cuts, and flexible cut strategies. Returns a list with the processed data, subgroup factor names, cut expressions, and LASSO selection results.

Usage

get_FSdata(
  df.analysis,
  use_lasso = FALSE,
  use_grf = FALSE,
  grf_cuts = NULL,
  confounders.name,
  cont.cutoff = 4,
  conf_force = NULL,
  conf.cont_medians = NULL,
  conf.cont_medians_force = NULL,
  replace_med_grf = TRUE,
  defaultcut_names = NULL,
  cut_type = "default",
  exclude_cuts = NULL,
  outcome.name = "tte",
  event.name = "event",
  details = TRUE
)

Arguments

df.analysis

Data frame containing the data.

use_lasso

Logical. Whether to use LASSO for dimension reduction.

use_grf

Logical. Whether to use GRF cuts.

grf_cuts

Character vector of GRF cut expressions.

confounders.name

Character vector of confounder variable names.

cont.cutoff

Integer. Cutoff for continuous variable determination.

conf_force

Character vector of forced cut expressions.

conf.cont_medians

Character vector of continuous confounders to cut at median.

conf.cont_medians_force

Character vector of additional continuous confounders to force median cut.

replace_med_grf

Logical. If TRUE, removes median cuts that overlap with GRF cuts.

defaultcut_names

Character vector of confounders to force default cuts.

cut_type

Character. "default" or "median" for cut strategy.

exclude_cuts

Character vector of cut expressions to exclude.

outcome.name

Character. Name of outcome variable.

event.name

Character. Name of event indicator variable.

details

Logical. If TRUE, prints details during execution.

Value

A named list containing:

df

Data frame with binary cut-point indicator columns (named q1, q2, ...) appended to the original analysis data.

confs_names

Character vector of the internal column names (q1, q2, ...) corresponding to each candidate factor.

confs

Character vector of candidate factor specifications (continuous cut expressions and categorical variable names).

lassokeep

Character vector of factors retained by LASSO (if use_lasso = TRUE), or NULL.

lassoomit

Character vector of factors omitted by LASSO (if use_lasso = TRUE), or NULL.

Get Best Model from Comparison

Description

Extracts the best fitting model object from a comparison result. If no single best model can be determined, returns the Weibull model if selected by either AIC or BIC. Defaults to Weibull0 model if no model can be determined.

Usage

get_best_survreg(comparison_result)

Arguments

comparison_result

Output from compare_survreg_models or compare_multiple_survreg

Value

A survreg model object (defaults to Weibull0 model)

Get all exported functions from ForestSearch namespace

Description

Get all exported functions from ForestSearch namespace

Usage

get_bootstrap_exports()

Get all combinations of subgroup factors up to maxk

Description

Generates all possible combinations of subgroup factors up to a specified maximum size.

Usage

get_combinations_info(L, maxk)

Arguments

L

Integer. Number of subgroup factors.

maxk

Integer. Maximum number of factors in a combination.

Value

List with max_count (total combinations) and indices_list (indices for each k).

Get forced cut expressions for variables

Description

For each variable in conf.force.names, returns cut expressions if continuous.

Usage

get_conf_force(df, conf.force.names, cont.cutoff = 4)

Arguments

df

Data frame.

conf.force.names

Character vector of variable names.

cont.cutoff

Integer. Cutoff for continuous.

Value

Character vector of cut expressions.

Get indicator vector for selected subgroup factors

Description

Returns a vector indicating which factors are included in a subgroup combination.

Usage

get_covs_in(
  kk,
  maxk,
  L,
  counts_1factor,
  index_1factor,
  counts_2factor = NULL,
  index_2factor = NULL,
  counts_3factor = NULL,
  index_3factor = NULL
)

Arguments

kk

Integer. Index of the combination.

maxk

Integer. Maximum number of factors in a combination.

L

Integer. Number of subgroup factors.

counts_1factor

Integer. Number of single-factor combinations.

index_1factor

Matrix of indices for single-factor combinations.

counts_2factor

Integer. Number of two-factor combinations.

index_2factor

Matrix of indices for two-factor combinations.

counts_3factor

Integer. Number of three-factor combinations.

index_3factor

Matrix of indices for three-factor combinations.

Value

Numeric vector indicating selected factors (1 = included, 0 = not included).

Get variable name from cut expression

Description

Extracts the variable name from a cut expression.

Usage

get_cut_name(thiscut, confounders.name)

Arguments

thiscut

Character string of the cut expression.

confounders.name

Character vector of confounder names.

Value

Character vector of variable names.

Bootstrap Confidence Interval and Bias Correction Results

Description

Calculates confidence intervals and bias-corrected estimates for bootstrap results.

Usage

get_dfRes(
  Hobs,
  seHobs,
  H1_adj,
  H2_adj = NULL,
  ystar,
  cov_method = "standard",
  cov_trim = 0,
  est.scale = "hr",
  est.loghr = TRUE
)

Arguments

Hobs

Numeric. Observed estimate.

seHobs

Numeric. Standard error of observed estimate.

H1_adj

Numeric. Bias-corrected estimate 1.

H2_adj

Numeric. Bias-corrected estimate 2 (optional).

ystar

Matrix of bootstrap samples.

cov_method

Character. Covariance method ("standard" or "nocorrect").

cov_trim

Numeric. Trimming proportion for covariance (default: 0.0).

est.scale

Character. "hr" or "1/hr".

est.loghr

Logical. Is estimate on log(HR) scale?

Value

Data.table with confidence intervals and estimates.

Generate Prediction Dataset with Subgroup Treatment Recommendation

Description

Creates a prediction dataset with a treatment recommendation flag based on the subgroup definition. Supports both label expressions (e.g., "\{er <= 0\}") and bare column names (e.g., "q3.1").

Usage

get_dfpred(df.predict, sg.harm, version = 1)

Arguments

df.predict

Data frame for prediction (test or validation set).

sg.harm

Character vector of subgroup-defining labels. Values may be wrapped in braces and optionally negated, e.g. "\{er <= 0\}" or "!\{size <= 35\}". Plain column names (e.g., "q3.1") are treated as binary indicators that must equal 1.

version

Integer; encoding version (maintained for backward compatibility). Default: 1.

Details

Each element of sg.harm is processed as follows:

Outer braces and leading ! are stripped.
If the result matches "var op value" (where op is one of <=, <, >=, >, ==, !=), the comparison is executed directly on df.predict[[var]].
Otherwise the expression is treated as a column name and membership is df.predict[[name]] == 1.

Value

Data frame with treatment recommendation flag (treat.recommend): 0 for harm subgroup, 1 for complement.

Extract HR from DGM (Backward Compatible)

Description

Extracts hazard ratios from DGM object, supporting both old and new formats. Also supports CDE (controlled direct effect) extraction for Table 5 of Leon et al. (2024) alignment (theta-ddagger).

Usage

get_dgm_hr(dgm, which = "hr_H")

Arguments

dgm

DGM object (gbsg_dgm or aft_dgm_flex)

which

Character. Which HR to extract: "hr_H", "hr_Hc", "ahr_H", "ahr_Hc", "hr_overall", "ahr", "cde_H", "cde_Hc", "cde".

Value

Numeric hazard ratio value

Create DGM with Output File Path

Description

Wrapper function that creates a GBSG DGM and generates a standardized output file path for saving results.

Usage

get_dgm_with_output(
  model_harm,
  n,
  k_treat = 1,
  target_hr_harm = NULL,
  cens_type = "weibull",
  out_dir = NULL,
  file_prefix = "sim",
  file_suffix = "",
  include_hr_in_name = FALSE,
  verbose = FALSE,
  ...
)

Arguments

model_harm

Character. Model type ("alt" or "null")

n

Integer. Planned sample size (for filename)

k_treat

Numeric. Treatment effect multiplier

target_hr_harm

Numeric. Target HR for harm subgroup (used for calibration when model = "alt")

cens_type

Character. Censoring type

out_dir

Character. Output directory path. If NULL, no file path is generated

file_prefix

Character. Prefix for output filename

file_suffix

Character. Suffix for output filename

include_hr_in_name

Logical. Include achieved HR in filename. Default: FALSE

verbose

Logical. Print diagnostic information. Default: FALSE

...

Additional arguments passed to create_gbsg_dgm

Value

List with components:

dgm: The gbsg_dgm object
out_file: Character path to output file (NULL if out_dir is NULL)
k_inter: The k_inter value used (either calibrated or default)

Get Parameter with Default Fallback

Description

Safely retrieves a named element from a list, returning a default value if the element is missing or NULL.

Usage

get_param(args_list, param_name, default_value)

Arguments

args_list

List to extract from.

param_name

Character. Name of the element to retrieve.

default_value

Default value to return if element is missing or NULL.

Value

The value of args_list[[param_name]] if present and non-NULL, otherwise default_value.

Fast Cox Model HR Estimation

Description

Fits a minimal Cox model to estimate hazard ratio with reduced overhead. Disables robust variance, model matrix storage, and other extras for speed.

Usage

get_split_hr_fast(df, cox_init = 0)

Arguments

df

data.frame or data.table with Y, Event, Treat columns.

cox_init

Numeric. Initial value for coefficient (default 0).

Value

Numeric. Estimated hazard ratio, or NA if model fails.

Get subgroup membership vector

Description

Returns a vector indicating subgroup membership (1 if all selected factors are present, 0 otherwise).

Usage

get_subgroup_membership(zz, covs.in)

Arguments

zz

Matrix or data frame of subgroup factor indicators.

covs.in

Numeric vector indicating which factors are selected (1 = included).

Value

Numeric vector of subgroup membership (1/0).

Target Estimate and Standard Error for Bootstrap

Description

Calculates target estimate and standard error for bootstrap samples.

Usage

get_targetEst(x, ystar, cov_method = "standard", cov_trim = 0)

Arguments

x

Numeric vector of estimates.

ystar

Matrix of bootstrap samples.

cov_method

Character. Covariance method ("standard" or "nocorrect").

cov_trim

Numeric. Trimming proportion for covariance (default: 0.0).

Value

List with target estimate, standard errors, and correction term.

ggplot2 / patchwork forest plot

Description

Creates a publication-quality forest plot using ggplot2 for the CI panel and patchwork to assemble label and annotation columns alongside it. Unlike forestploter, fig.height maps directly to row density — row_height = fig.height / n_rows with no hidden scaling.

Usage

gg_forest(
  subgroups,
  est,
  lo,
  hi,
  cat_vec = NULL,
  cat_colours = NULL,
  annot = NULL,
  ref_line = 1,
  vert_lines = NULL,
  ref_col = "firebrick",
  ref_lty = "dashed",
  vert_col = "grey50",
  vert_lty = "dotted",
  xlim = NULL,
  ticks_at = NULL,
  tick_labels = NULL,
  xlog = TRUE,
  xlab = "Hazard Ratio",
  title = NULL,
  subtitle = NULL,
  footnote = NULL,
  point_size = 2.5,
  line_size = 0.8,
  point_shape = 21,
  base_size = 11,
  widths = NULL,
  row_expand = 0.6
)

Arguments

subgroups

Character vector of subgroup names (displayed top to bottom).

est

Numeric vector of point estimates (median HR or similar).

lo

Numeric vector of lower bounds (e.g. 1st percentile ECI).

hi

Numeric vector of upper bounds (e.g. 99th percentile ECI).

cat_vec

Optional character vector of category labels (one per row). Used to colour CI lines and label text.

cat_colours

Optional named character vector mapping category labels to colours. Defaults to grey for all rows.

annot

Optional named list of character vectors, one per annotation column. Names become column headers. Each vector must match length(subgroups).

ref_line

Numeric. X position of the primary reference line (default 1). Drawn as a dashed red line.

vert_lines

Numeric vector. X positions of secondary vertical lines (default NULL). Drawn as dotted grey lines.

ref_col

Colour of the primary reference line (default "firebrick").

ref_lty

Line type of the primary reference line (default "dashed").

vert_col

Colour of secondary vertical lines (default "grey50").

vert_lty

Line type of secondary vertical lines (default "dotted").

xlim

Numeric vector length 2. X-axis limits for the CI panel.

ticks_at

Numeric vector. X-axis tick positions.

tick_labels

Character vector. Custom tick labels (default: as.character(ticks_at)).

xlog

Logical. If TRUE (default), x-axis on log scale.

xlab

Character. X-axis label (default "Hazard Ratio").

title

Character. Overall plot title (default NULL).

subtitle

Character. Plot subtitle (default NULL).

footnote

Character. Footnote appended below the CI panel (default NULL).

point_size

Numeric. Size of point estimate symbol (default 2.5).

line_size

Numeric. Line width of CI segments (default 0.8).

point_shape

Integer. pch for point estimates (default 21, filled circle).

base_size

Numeric. ggplot2 base font size in pt (default 11). Controls all text — increase to make the plot larger; no other knob needed.

widths

Numeric vector. Relative patchwork column widths: c(label, ci, annot_1, annot_2, …). Default: c(3.5, 5, rep(1, n_annot)).

row_expand

Numeric. Extra space above and below row range on y-axis, in row units (default 0.6).

Value

A patchwork object. Render with print() or plot(). Control dimensions entirely via knitr chunk options fig.width / fig.height: row height = fig.height / n_rows.

GRF Subgroup Evaluation and Performance Metrics

Description

Evaluates the performance of GRF-identified subgroups, including hazard ratios, bias, and predictive values. This function is typically used in simulation studies to assess the performance of the GRF subgroup identification method.

Usage

grf.subg.eval(
  df,
  grf.est,
  dgm,
  cox.formula.sim,
  cox.formula.adj.sim,
  analysis = "GRF",
  frac.tau = 1
)

Arguments

df

Data frame containing the analysis data.

grf.est

List. Output from grf.subg.harm.survival.

dgm

List. Data-generating mechanism (truth) for simulation.

cox.formula.sim

Formula for unadjusted Cox model.

cox.formula.adj.sim

Formula for adjusted Cox model.

analysis

Character. Analysis label (default: "GRF").

frac.tau

Numeric. Fraction of tau for GRF horizon (default: 1.0).

Value

A data frame with evaluation metrics.

GRF Subgroup Identification for Survival Data

Description

Identifies subgroups with differential treatment effect using generalized random forests (GRF) and policy trees. This function uses causal survival forests to identify heterogeneous treatment effects and policy trees to create interpretable subgroup definitions.

Usage

grf.subg.harm.survival(
  data,
  confounders.name,
  outcome.name,
  event.name,
  id.name,
  treat.name,
  frac.tau = 1,
  n.min = 60,
  dmin.grf = 0,
  RCT = TRUE,
  details = FALSE,
  sg.criterion = "mDiff",
  maxdepth = 2,
  seedit = 8316951,
  return_selected_cuts_only = FALSE
)

Arguments

data

Data frame containing the analysis data.

confounders.name

Character vector of confounder variable names.

outcome.name

Character. Name of outcome variable (e.g., time-to-event).

event.name

Character. Name of event indicator variable (0/1).

id.name

Character. Name of ID variable.

treat.name

Character. Name of treatment group variable (0/1).

frac.tau

Numeric. Fraction of tau for GRF horizon (default: 1.0).

n.min

Integer. Minimum subgroup size (default: 60).

dmin.grf

Numeric. Minimum difference in subgroup mean (default: 0.0).

RCT

Logical. Is the data from a randomized controlled trial? (default: TRUE)

details

Logical. Print details during execution (default: FALSE).

sg.criterion

Character. Subgroup selection criterion ("mDiff" or "Nsg").

maxdepth

Integer. Maximum tree depth (1, 2, or 3; default: 2).

seedit

Integer. Random seed (default: 8316951).

return_selected_cuts_only

Logical. If TRUE, returns only cuts from the tree depth that identified the selected subgroup meeting dmin.grf. If FALSE (default), returns all cuts from all fitted trees (depths 1 through maxdepth).

Details

The return_selected_cuts_only parameter controls which cuts are returned:

FALSE (default): Returns all cuts from all fitted trees (depths 1 to maxdepth). This provides the full set of candidate splits for downstream exploration and is the original behavior for backward compatibility.
TRUE: Returns only cuts from the tree at the depth that identified the "winning" subgroup meeting the dmin.grf criterion. This is useful when you want a focused set of cuts associated with the selected subgroup, reducing noise from non-selected trees.

When return_selected_cuts_only = TRUE and no subgroup meets the criteria, tree.cuts will be empty (character(0)).

Value

A list with GRF results, including:

data

Original data with added treatment recommendation flags

grf.gsub

Selected subgroup information

sg.harm.id

Expression defining the identified subgroup

tree.cuts

Cut expressions - either all cuts from all trees (if return_selected_cuts_only = FALSE) or only cuts from the selected tree depth (if return_selected_cuts_only = TRUE)

tree.names

Unique variable names used in cuts

tree

Selected policy tree object

tau.rmst

Time horizon used for RMST

harm.any

All subgroups with positive treatment effect difference

selected_depth

Depth of the tree that identified the subgroup (when found)

return_selected_cuts_only

Logical indicating which cut extraction mode was used

Additional tree-specific cuts and objects (tree1, tree2, tree3) based on maxdepth

Check if Matrix Has Positive Variance

Description

Check if Matrix Has Positive Variance

Usage

has_positive_variance(x)

Format Hazard Ratio and Confidence Interval

Description

Formats a hazard ratio and confidence interval for display.

Usage

hrCI_format(hrest)

Arguments

hrest

Numeric vector with HR, lower, and upper confidence limits.

Value

Character string formatted as \"HR (lower, upper)\".

Generate Narrative Interpretation of Estimation Properties

Description

Produces a templated text summary of the estimation properties table, automatically populating numerical results from the simulation output. Useful for reproducible vignettes where interpretation paragraphs should update when simulations are re-run.

Usage

interpret_estimation_table(
  results,
  dgm,
  analysis_method = "FSlg",
  n_sims = NULL,
  n_boots = 300,
  digits = 2,
  scenario = NULL,
  cat = TRUE
)

Arguments

results

Data frame of simulation results (same as for build_estimation_table).

dgm

DGM object with true parameter values.

analysis_method

Character. Which analysis method to summarise. Default: "FSlg".

n_sims

Integer. Total number of simulations (for detection rate). If NULL (default), derived from nrow(results) after filtering to the analysis method.

n_boots

Integer. Number of bootstraps (for narrative). Default: 300.

digits

Integer. Decimal places for reported values. Default: 2.

scenario

Character. One of "null" or "alt" (default). Controls the interpretive framing:

"null": emphasises false-positive rate and selection bias
"alt": emphasises power, bias relative to true effect

If NULL, inferred from the DGM (hr_H_true == hr_Hc_true implies null).

cat

Logical. If TRUE (default), prints the paragraph via cat(). If FALSE, returns it invisibly as a character string (useful for programmatic insertion into Rmd via results = "asis").

Value

Invisibly returns the interpretation as a character string.

Check if a variable is continuous

Description

Determines if a variable is continuous based on the number of unique values.

Usage

is.continuous(x, cutoff = 4)

Arguments

x

A vector.

cutoff

Integer. Minimum number of unique values to be considered continuous.

Value

1 if continuous, 2 if not.

Check if cut expression is for a continuous variable (OPTIMIZED)

Description

Determines if a cut expression refers to a continuous variable. This optimized version avoids redundant lookups by using word boundary matching instead of partial string matching.

Usage

is_flag_continuous(thiscut, confounders.name, df, cont.cutoff)

Arguments

thiscut

Character string of the cut expression.

confounders.name

Character vector of confounder names.

df

Data frame.

cont.cutoff

Integer. Cutoff for continuous.

Value

Logical; TRUE if continuous, FALSE otherwise.

Check if cut expression should be dropped

Description

Determines if a cut expression should be dropped (e.g., variable has <=1 unique value).

Usage

is_flag_drop(thiscut, confounders.name, df)

Arguments

thiscut

Character string of the cut expression.

confounders.name

Character vector of confounder names.

df

Data frame.

Value

Logical; TRUE if should be dropped, FALSE otherwise.

KM median summary for subgroup

Description

Calculates median survival for each treatment group using Kaplan-Meier.

Usage

km_summary(Y, E, Treat)

Arguments

Y

Numeric vector of outcome.

E

Numeric vector of event indicators.

Treat

Numeric vector of treatment indicators.

Value

Numeric vector of medians.

LASSO selection for Cox model

Description

Performs LASSO variable selection using Cox regression.

Usage

lasso_selection(
  df,
  confounders.name,
  outcome.name,
  event.name,
  seedit = 8316951
)

Arguments

df

Data frame.

confounders.name

Character vector of confounder names.

outcome.name

Character. Name of outcome variable.

event.name

Character. Name of event indicator variable.

seedit

Simulates multi-regional clinical trials and evaluates ForestSearch subgroup identification. Splits data by region into training and testing populations, identifies subgroups using ForestSearch on training data, and evaluates performance on the testing region.

Usage

mrct_region_sims(
  dgm,
  n_sims,
  n_sample = NULL,
  region_var = "z_regA",
  sg_focus = "minSG",
  maxk = 1,
  hr.threshold = 0.9,
  hr.consistency = 0.8,
  pconsistency.threshold = 0.9,
  confounders.name = NULL,
  conf_force = NULL,
  fs_args = list(),
  sim_args = list(rand_ratio = 1, draw_treatment = TRUE),
  analysis_time = 60,
  cens_adjust = 0,
  parallel_args = list(plan = "multisession", workers = NULL, show_message = TRUE),
  details = FALSE,
  verbose_n_sims = 2L,
  seed = NULL
)

Arguments

dgm

Data generating mechanism object from generate_aft_dgm_flex

n_sims

Integer. Number of simulations to run

n_sample

Integer. Sample size per simulation. If NULL (default), uses the entire super-population from dgm

region_var

Character. Name of the region indicator variable used to split data into training (region_var == 0) and testing (region_var == 1) populations. Default: "z_regA"

sg_focus

Character. Subgroup selection criterion passed to forestsearch: "minSG", "hr", or "maxSG". Default: "minSG"

maxk

Integer. Maximum number of factors in subgroup combinations (1 or 2). Default: 1

hr.threshold

Numeric. Hazard ratio threshold for subgroup identification. Default: 0.90

hr.consistency

Numeric. Consistency threshold for hazard ratio. Default: 0.80

pconsistency.threshold

Numeric. Probability threshold for consistency. Default: 0.90

confounders.name

Character vector. Confounder variable names for ForestSearch. If NULL, automatically extracted from dgm

conf_force

Character vector. Forced cuts to consider in ForestSearch. Default: c("z_age <= 65", "z_bm <= 0", "z_bm <= 1", "z_bm <= 2", "z_bm <= 5")

fs_args

Named list. Additional arguments passed directly to forestsearch inside each simulation replicate. Use this to control parameters not exposed by mrct_region_sims (e.g., use_grf, use_lasso, cut_type, d0.min, d1.min, n.min, max_subgroups_search, use_twostage, twostage_args). Parameters already in the mrct_region_sims signature (hr.threshold, hr.consistency, pconsistency.threshold, sg_focus, maxk, confounders.name, conf_force) take precedence over values in fs_args. Default: list() (uses forestsearch defaults)

sim_args

Named list. Additional arguments passed to simulate_from_dgm inside each replicate (e.g., rand_ratio, draw_treatment). Parameters already in the mrct_region_sims signature (analysis_time, cens_adjust) take precedence. Default: list(rand_ratio = 1, draw_treatment = TRUE)

analysis_time

Numeric. Time of analysis for administrative censoring. Default: 60

cens_adjust

Numeric. Adjustment factor for censoring rate on log scale. Default: 0

parallel_args

List. Parallel processing configuration with components:

plan: "multisession", "multicore", "callr", or "sequential"
workers: Number of workers (NULL for auto-detect)
show_message: Logical for progress messages

details

Logical. Print detailed progress information. Default: FALSE

verbose_n_sims

Integer. When details = TRUE, print full ForestSearch diagnostics (including internal output) for only the first verbose_n_sims simulation replicates. Set to 0 to suppress per-sim output, or Inf to print all. Default: 2

seed

Integer. Base random seed for reproducibility. Default: NULL

Details

Simulation Process

For each simulation:

Sample from super-population using simulate_from_dgm
Split by region_var into training and testing populations
Estimate HRs in ITT, training, and testing populations
Run forestsearch on training population
Apply identified subgroup to testing population
Calculate subgroup-specific estimates

Region Variable

The region_var parameter is used ONLY for splitting data into training/testing populations. It does not imply any prognostic effect. To include prognostic confounder effects, specify them when creating the DGM using create_dgm_for_mrct or generate_aft_dgm_flex.

Value

A data.table with simulation results containing:

sim: Simulation index
n_itt: ITT sample size
hr_itt: ITT hazard ratio (stratified if strat variable present)
hr_ittX: ITT hazard ratio stratified by region
n_train: Training (non-region A) sample size
hr_train: Training population hazard ratio
n_test: Testing (region A) sample size
hr_test: Testing population hazard ratio
any_found: Indicator: 1 if subgroup identified, 0 otherwise
sg_found: Character description of identified subgroup
n_sg: Subgroup sample size
hr_sg: Subgroup hazard ratio in testing population
POhr_sg: Potential outcome hazard ratio in subgroup (testing)
prev_sg: Subgroup prevalence (proportion of testing population)
n_sg_train: Subgroup sample size in training population
hr_sg_train: Subgroup hazard ratio in training population
POhr_sg_train: Potential outcome hazard ratio in subgroup (training)
hr_sg_null: Subgroup HR when found, NA otherwise

Calculate n and percent

Description

Returns count and percent for a vector relative to a denominator.

Usage

n_pcnt(x, denom)

Arguments

x

Vector of values.

denom

Denominator for percent calculation.

Value

Character string formatted as \"n (percent%)\".

Parse sg.harm Factor Names to Expression

Description

Converts ForestSearch factor names (e.g., "er.0", "grade3.1") into human-readable R expressions (e.g., "er <= 0", "grade3 == 1").

Usage

parse_sg_harm_to_expression(sg_harm, fs.est = NULL)

Arguments

sg_harm

Character vector of factor names from fs.est$sg.harm.

fs.est

ForestSearch object (for accessing confs_labels if available).

Value

Character string expression or NULL if parsing fails.

Plot ForestSearch Results

Description

Dispatches to plot_sg_results for Kaplan-Meier curves, hazard-ratio forest plots, or combined panels.

Usage

## S3 method for class 'forestsearch'
plot(
  x,
  type = c("combined", "km", "forest", "summary"),
  outcome.name = "Y",
  event.name = "Event",
  treat.name = "Treat",
  ...
)

Arguments

x

A forestsearch object returned by forestsearch.

type

Character. Type of plot:

"combined": KM curves + forest plot (default)
"km": Kaplan-Meier survival curves only
"forest": Hazard-ratio forest plot only
"summary": Summary statistics panel

outcome.name

Character. Name of time-to-event column. Default: "Y".

event.name

Character. Name of event indicator column. Default: "Event".

treat.name

Character. Name of treatment column. Default: "Treat".

...

Additional arguments passed to plot_sg_results, such as by.risk, conf.level, est.scale, sg0_name, sg1_name, treat_labels, colors, title, show_events, show_ci, show_logrank, show_hr.

Value

Invisibly returns the plot result from plot_sg_results.

Plot Method for ForestSearch Forest Plot

Description

Plot Method for ForestSearch Forest Plot

Usage

## S3 method for class 'fs_forestplot'
plot(x, ...)

Arguments

x

An fs_forestplot object

...

Additional arguments (ignored)

Value

Invisibly returns x.

Plot Method for fs_sg_plot Objects

Description

Plot Method for fs_sg_plot Objects

Usage

## S3 method for class 'fs_sg_plot'
plot(x, which = 1, ...)

Arguments

x

An fs_sg_plot object

which

Character or integer. Which plot to display. Default: 1 (first available)

...

Additional arguments passed to plot functions

Value

Invisibly returns x.

Plot Detection Probability Curve

Description

Creates a visualization of the detection probability curve.

Usage

plot_detection_curve(
  curve_data,
  add_reference_lines = TRUE,
  add_threshold_line = TRUE,
  title = NULL,
  ...
)

Arguments

curve_data

A data.frame from generate_detection_curve or with columns: theta, probability.

add_reference_lines

Logical. Add horizontal reference lines at 0.05, 0.10, 0.80. Default: TRUE

add_threshold_line

Logical. Add vertical line at hr_threshold. Default: TRUE

title

Character. Plot title. Default: auto-generated

...

Additional arguments passed to plot()

Value

Invisibly returns the input data.

Plot Kaplan-Meier Survival Difference Bands for ForestSearch Subgroups

Description

Creates Kaplan-Meier survival difference band plots comparing the identified ForestSearch subgroup (sg.harm) and its complement against the ITT population. This function wraps plotKM.band_subgroups() from the weightedsurv package, automatically extracting subgroup definitions from ForestSearch results.

Usage

plot_km_band_forestsearch(
  df,
  fs.est = NULL,
  sg_cols = NULL,
  sg_labels = NULL,
  sg_colors = NULL,
  itt_color = "azure3",
  outcome.name = "tte",
  event.name = "event",
  treat.name = "treat",
  xlabel = "Time",
  ylabel = "Survival differences",
  yseq_length = 5,
  draws_band = 1000,
  tau_add = NULL,
  by_risk = 6,
  risk_cex = 0.75,
  risk_delta = 0.035,
  risk_pad = 0.015,
  ymax_pad = 0.11,
  show_legend = TRUE,
  legend_pos = "topleft",
  legend_cex = 0.75,
  ref_subgroups = NULL,
  verbose = FALSE
)

Arguments

df

Data frame. The analysis dataset containing all required variables including subgroup indicator columns.

fs.est

A forestsearch object containing the identified subgroup, or NULL if using pre-defined subgroup indicators.

sg_cols

Character vector. Names of columns in df containing subgroup indicators (0/1). These columns must already exist in df. If NULL and fs.est is provided, columns will be created automatically. Default: NULL

sg_labels

Character vector. Subsetting expressions for each subgroup, corresponding to sg_cols. These are passed to plotKM.band_subgroups() which evaluates them as R expressions (e.g., "age < 65", "er <= 0", "Qrecommend == 1"). Must be same length as sg_cols. Default: NULL (auto-generated as "colname == 1").

sg_colors

Character vector. Colors for each subgroup curve, corresponding to sg_cols. Must be same length as sg_cols. Default: NULL (uses default color palette).

itt_color

Character. Color for ITT population band. Default: "azure3".

outcome.name

Character. Name of time-to-event column. Default: "tte".

event.name

Character. Name of event indicator column. Default: "event".

treat.name

Character. Name of treatment column. Default: "treat".

xlabel

Character. X-axis label. Default: "Time".

ylabel

Character. Y-axis label. Default: "Survival differences".

yseq_length

Integer. Number of y-axis tick marks. Default: 5.

draws_band

Integer. Number of bootstrap draws for confidence band. Default: 1000.

tau_add

Numeric. Time horizon for the plot. If NULL, auto-calculated from data. Default: NULL.

by_risk

Numeric. Interval for risk table. Default: 6.

risk_cex

Numeric. Character expansion for risk table text. Default: 0.75.

risk_delta

Numeric. Vertical spacing for risk table. Default: 0.035.

risk_pad

Numeric. Padding for risk table. Default: 0.015.

ymax_pad

Numeric. Y-axis maximum padding. Default: 0.11.

show_legend

Logical. Whether to display the legend. Default: TRUE.

legend_pos

Character. Legend position (e.g., "topleft", "bottomright"). Default: "topleft".

legend_cex

Numeric. Character expansion for legend text. Default: 0.75.

ref_subgroups

Named list. Optional additional reference subgroups to include. Each element should be a list with:

subset_expr: Character. R expression to define subgroup (e.g., "age < 65")
label: Character. Display label (optional, defaults to subset_expr)
color: Character. Color for the curve

The function automatically creates indicator columns from the expressions. Default: NULL.

verbose

Logical. Print diagnostic messages. Default: FALSE.

Details

This function simplifies the workflow of creating KM survival difference band plots for ForestSearch-identified subgroups. It can work in two modes:

Mode 1: With ForestSearch result (fs.est provided)

Extracts the subgroup definition from the ForestSearch result
Creates binary indicator columns (Qrecommend, Brecommend) in df
Generates appropriate labels from the subgroup definition
Calls plotKM.band_subgroups() with configured parameters

Mode 2: With pre-defined columns (sg_cols provided)

Uses existing indicator columns in df
Requires sg_labels and sg_colors to match sg_cols

The sg.harm subgroup (Qrecommend) represents patients with questionable treatment benefit (where treat.recommend == 0 in ForestSearch output). The complement (Brecommend) represents patients recommended for treatment.

Value

Invisibly returns a list containing:

df: The modified data frame with subgroup indicators
sg_cols: Character vector of subgroup column names used
sg_labels: Character vector of subgroup labels used
sg_colors: Character vector of colors used
sg_harm_definition: The subgroup definition extracted from fs.est
ref_subgroups: The reference subgroups list (if provided)

Subgroup Extraction

When fs.est is provided, the subgroup definition is extracted from:

fs.est$grp.consistency$out_sg$sg.harm_label - Human-readable labels
fs.est$sg.harm - Technical factor names (fallback)
fs.est$df.est$treat.recommend - Subgroup membership indicator

Note

This function requires the weightedsurv package, which can be installed from GitHub: devtools::install_github("larry-leon/weightedsurv")

Plot Distribution of Identified Subgroups

Description

Bar chart of subgroups identified across simulations, filtered to those appearing in at least min_pct of the found simulations.

Usage

plot_sg_distribution(
  results,
  min_pct = 5,
  title = "Distribution of Identified Subgroups",
  wrap_width = 25
)

Arguments

results

data.table from mrct_region_sims()

min_pct

Numeric. Minimum percentage threshold for display (default: 5)

title

Character. Plot title. Default: "Distribution of Identified Subgroups"

wrap_width

Integer. Character width for wrapping long subgroup labels. Default: 25

Value

A ggplot2 object

Plot Forest Plot of Hazard Ratios

Description

Creates a forest plot showing hazard ratios with confidence intervals.

Usage

plot_sg_forest(hr_estimates, sg0_name, sg1_name, colors, title = NULL, ...)

Arguments

hr_estimates

Data frame with HR estimates

sg0_name

Character. Label for H subgroup

sg1_name

Character. Label for Hc subgroup

colors

List. Color specifications

title

Character. Plot title

...

Additional arguments

Value

Invisible NULL (creates plot as side effect)

Plot Kaplan-Meier Survival Curves for Subgroups

Description

Creates side-by-side Kaplan-Meier survival curves for the H and Hc subgroups.

Usage

plot_sg_km(
  df_H,
  df_Hc,
  outcome.name,
  event.name,
  treat.name,
  by.risk,
  sg0_name,
  sg1_name,
  treat_labels,
  colors,
  show_ci = TRUE,
  show_logrank = TRUE,
  show_hr = TRUE,
  hr_estimates = NULL,
  conf.level = 0.95,
  title = NULL,
  ...
)

Arguments

df_H

Data frame for H subgroup

df_Hc

Data frame for Hc subgroup

outcome.name

Character. Outcome variable name

event.name

Character. Event indicator name

treat.name

Character. Treatment variable name

by.risk

Numeric. Risk table interval

sg0_name

Character. Label for H subgroup

sg1_name

Character. Label for Hc subgroup

treat_labels

Named character vector. Treatment labels

colors

List. Color specifications

show_ci

Logical. Show confidence intervals

show_logrank

Logical. Show log-rank p-value

show_hr

Logical. Show HR annotation

hr_estimates

Data frame with HR estimates

conf.level

Numeric. Confidence level

title

Character. Plot title

...

Additional arguments

Value

Invisible NULL (creates plot as side effect)

Plot ForestSearch Subgroup Results

Description

Creates comprehensive visualizations of subgroup results from ForestSearch, including Kaplan-Meier survival curves, hazard ratio comparisons, and summary statistics. This function is designed to work with the output from forestsearch, specifically the df.est component.

Usage

plot_sg_results(
  fs.est,
  outcome.name = "Y",
  event.name = "Event",
  treat.name = "Treat",
  plot_type = c("combined", "km", "forest", "summary"),
  by.risk = NULL,
  conf.level = 0.95,
  est.scale = c("hr", "1/hr"),
  sg0_name = "Questionable (H)",
  sg1_name = "Recommend (H^c)",
  treat_labels = c(`0` = "Control", `1` = "Treatment"),
  colors = NULL,
  title = NULL,
  show_events = TRUE,
  show_ci = TRUE,
  show_logrank = TRUE,
  show_hr = TRUE,
  verbose = FALSE,
  ...
)

Arguments

fs.est

A forestsearch object or list containing at minimum:

df.est: Data frame with analysis data including treat.recommend
sg.harm: Character vector of subgroup-defining variable names
grp.consistency: Optional. Consistency results from sg_consistency_out

outcome.name

Character. Name of time-to-event outcome column. Default: "Y"

event.name

Character. Name of event indicator column (1=event, 0=censored). Default: "Event"

treat.name

Character. Name of treatment column (1=treatment, 0=control). Default: "Treat"

plot_type

Character. Type of plot to create. One of:

"km": Kaplan-Meier survival curves
"forest": Forest plot of hazard ratios
"summary": Summary statistics panel
"combined": All plots combined (default)

by.risk

Numeric. Risk interval for KM survival curves. Default: NULL (auto-calculated)

conf.level

Numeric. Confidence level for intervals. Default: 0.95

est.scale

Character. Effect scale: "hr" (hazard ratio) or "1/hr" (inverse). Default: "hr"

sg0_name

Character. Label for subgroup 0 (harm/questionable). Default: "Questionable (H)"

sg1_name

Character. Label for subgroup 1 (recommend/complement). Default: "Recommend (H^c)"

treat_labels

Named character vector. Labels for treatment arms. Default: c("0" = "Control", "1" = "Treatment")

colors

Named character vector. Colors for plot elements. Default: uses package defaults

title

Character. Main plot title. Default: auto-generated

show_events

Logical. Show event counts on KM curves. Default: TRUE

show_ci

Logical. Show confidence intervals. Default: TRUE

show_logrank

Logical. Show log-rank p-value. Default: TRUE

show_hr

Logical. Show hazard ratio annotation. Default: TRUE

verbose

Logical. Print diagnostic messages. Default: FALSE

...

Additional arguments passed to plotting functions.

Details

The function extracts subgroup membership from fs.est$df.est$treat.recommend:

treat.recommend == 0: Harm/questionable subgroup (H)
treat.recommend == 1: Recommend/complement subgroup (H^c)

For est.scale = "1/hr", treatment labels and subgroup interpretation are reversed to maintain clinical interpretability.

Value

An object of class fs_sg_plot containing:

plots: List of ggplot2 or base R plot objects
summary: Data frame of subgroup summary statistics
hr_estimates: Data frame of hazard ratio estimates
call: The matched call

Kaplan-Meier Plots

When plot_type = "km", creates side-by-side survival curves for:

The identified subgroup (H) with treatment vs control
The complement subgroup (H^c) with treatment vs control

Forest Plot

When plot_type = "forest", creates a forest plot showing hazard ratios with confidence intervals for: ITT population, H subgroup, and H^c complement.

Plot Summary Statistics Panel

Description

Creates a summary panel with subgroup characteristics.

Usage

plot_sg_summary_panel(
  summary_stats,
  hr_estimates,
  sg0_name,
  sg1_name,
  colors,
  ...
)

Arguments

summary_stats

Data frame with summary statistics

hr_estimates

Data frame with HR estimates

sg0_name

Character. Label for H subgroup

sg1_name

Character. Label for Hc subgroup

colors

List. Color specifications

...

Additional arguments

Value

Invisible NULL (creates plot as side effect)

Plot Weighted Kaplan-Meier Curves for ForestSearch Subgroups

Description

Creates weighted Kaplan-Meier survival curves for the identified subgroups (H and Hc) using the weightedsurv package, matching the pattern used in sg_consistency_out().

Usage

plot_sg_weighted_km(
  fs.est,
  fs_bc = NULL,
  outcome.name = "Y",
  event.name = "Event",
  treat.name = "Treat",
  by.risk = NULL,
  sg0_name = NULL,
  sg1_name = NULL,
  conf.int = TRUE,
  show.logrank = TRUE,
  show.cox = TRUE,
  show.cox.bc = TRUE,
  put.legend.lr = "topleft",
  ymax = 1.05,
  xmed.fraction = 0.65,
  hr_bc_position = "bottomright",
  hr_bc_cex = 0.725,
  title = NULL,
  verbose = FALSE
)

Arguments

fs.est

A forestsearch object containing df.est with treat.recommend column, or a data frame directly.

fs_bc

Optional. Bootstrap results from forestsearch_bootstrap_dofuture() containing bias-corrected HR estimates. If provided, bias-corrected HRs will be annotated on the plots.

outcome.name

Character. Name of time-to-event column. Default: "Y"

event.name

Character. Name of event indicator column. Default: "Event"

treat.name

Character. Name of treatment column. Default: "Treat"

by.risk

Numeric. Risk interval for plotting. Default: NULL (auto-calculated as max(outcome)/12)

sg0_name

Character. Label for H subgroup (treat.recommend == 0). Default: NULL (auto-extracted from forestsearch object as "H: {definition}" or "Questionable (H)" if not available)

sg1_name

Character. Label for Hc subgroup (treat.recommend == 1). Default: NULL (auto-generated as "Hc: NOT {definition}" or "Recommend (Hc)" if not available)

conf.int

Logical. Show confidence intervals. Default: TRUE

show.logrank

Logical. Show log-rank test. Default: TRUE

show.cox

Logical. Show unadjusted Cox HR from weightedsurv. Default: TRUE

show.cox.bc

Logical. Show bootstrap bias-corrected HR annotation (requires fs_bc). Default: TRUE

put.legend.lr

Character. Legend position. Default: "topleft"

ymax

Numeric. Max y-axis value. Default: 1.05

xmed.fraction

Numeric. Fraction for median lines. Default: 0.65

hr_bc_position

Character. Position for bias-corrected HR annotation. One of "bottomright", "bottomleft", "topright", "topleft". Default: "bottomright"

hr_bc_cex

Numeric. Character expansion factor for bias-corrected HR annotation text. Default: 0.725 (matches weightedsurv cox.cex default)

title

Character. Overall plot title. Default: NULL

verbose

Logical. Print diagnostic messages. Default: FALSE

Details

This function uses the exact same calling pattern as plot_subgroup() in the ForestSearch package. Column names are mapped internally to the standard names (Y, Event, Treat) expected by weightedsurv.

Subgroup definitions are automatically extracted from the forestsearch object if available:

fs$grp.consistency$out_sg$sg.harm_label - Human-readable labels
fs$sg.harm - Technical factor names (fallback)

HR display options controlled by show.cox and show.cox.bc:

Both TRUE (default): Shows unadjusted HR from weightedsurv AND bias-corrected HR annotation
show.cox = TRUE, show.cox.bc = FALSE: Shows only unadjusted HR
show.cox = FALSE, show.cox.bc = TRUE: Shows only bias-corrected HR
Both FALSE: Shows neither HR estimate

Value

Invisibly returns a list with subgroup data frames and counting data

Plot Spline Treatment Effect Function

Description

Plot Spline Treatment Effect Function

Usage

plot_spline_treatment_effect(dgm_result, add_points = TRUE)

Arguments

dgm_result

Result object from generate_aft_dgm_flex with spline

add_points

Logical; add observed data points. Default TRUE

Value

No return value, called for side effects (produces a plot).

Plot Subgroup Survival Curves

Description

Plots weighted Kaplan-Meier survival curves for a specified subgroup and its complement using the weightedsurv package.

Usage

plot_subgroup(df.sub, df.subC, by.risk, confs_labels, this.1_label, top_result)

Arguments

df.sub

A data frame containing data for the subgroup of interest.

df.subC

A data frame containing data for the complement subgroup.

by.risk

Numeric. The risk interval for plotting (passed to weightedsurv::df_counting).

confs_labels

Named character vector. Covariate label mapping (not used directly in this function, but may be used for labeling).

this.1_label

Character. Label for the subgroup being plotted.

top_result

Data frame row. The top subgroup result row, expected to contain a Pcons column for consistency criteria.

Plot Subgroup Analysis Results

Description

Creates diagnostic plots for subgroup treatment effects from df_super object

Usage

plot_subgroup_effects(
  df_super,
  z,
  hrz_crit = 0,
  log.hrs = NULL,
  ahr_empirical = NULL,
  plot_type = c("both", "profile", "ahr"),
  add_rug = TRUE,
  zpoints_by = 1,
  ...
)

Arguments

df_super

A data frame containing subgroup analysis results with columns: loghr_po (log hazard ratios), and optionally theta_1 and theta_0 (treatment-specific parameters)

z

Character string specifying the column name to use as the subgroup score (e.g., "z_age", "z_size", "subgroup"). Required.

hrz_crit

Critical hazard ratio threshold for defining optimal subgroup. Default is 1 (HR=1 on log scale is 0).

log.hrs

Optional vector of reference log hazard ratios to display as horizontal lines. Default is NULL.

ahr_empirical

Optional empirical average hazard ratio to display. If NULL, calculated from data. Default is NULL.

plot_type

Character string specifying plot type: "both" (default), "profile", or "ahr".

add_rug

Logical indicating whether to add rug plot of z values. Default is TRUE.

zpoints_by

Step size for z-axis grid when calculating AHR curves. Default is 1.

...

Additional graphical parameters passed to plot()

Details

The function creates up to two plots:

Treatment effect profile: Shows log hazard ratio as function of z
Average hazard ratio curve: Shows AHR for subgroups z >= threshold

The "optimal" subgroup is defined as patients with z >= cut.zero, where cut.zero is the minimum z value with favorable treatment effect (loghr < hrz_crit).

Value

A list containing:

cut.zero

The minimum z value where loghr_po < hrz_crit

AHR_opt

Average hazard ratio for optimal subgroup (z >= cut.zero)

zpoints

Grid of z values used for AHR calculations

HR.zpoints

AHR for population with z >= zpoints

HRminus.zpoints

AHR for population with z <= zpoints

HR2.zpoints

Alternative AHR calculation for z >= zpoints

HRminus2.zpoints

Alternative AHR calculation for z <= zpoints

Plot Subgroup Results Forest Plot

Description

Generates a comprehensive forest plot showing:

ITT (Intent-to-Treat) population estimate
Reference subgroups (e.g., by biomarker levels)
Post-hoc identified subgroups with bias-corrected estimates
Cross-validation agreement metrics as annotations

Usage

plot_subgroup_results_forestplot(
  fs_results,
  df_analysis,
  subgroup_list = NULL,
  outcome.name,
  event.name,
  treat.name,
  E.name = "Experimental",
  C.name = "Control",
  est.scale = "hr",
  xlog = TRUE,
  title_text = NULL,
  arrow_text = c("Favors Experimental", "Favors Control"),
  footnote_text = c("Eg 80% of training found SG: 70% of B (+) also B in CV testing"),
  xlim = c(0.25, 1.5),
  ticks_at = c(0.25, 0.7, 1, 1.5),
  show_cv_metrics = TRUE,
  cv_source = c("auto", "kfold", "oob", "both"),
  posthoc_colors = c("powderblue", "beige"),
  reference_colors = c("yellow", "powderblue"),
  ci_column_spaces = 20,
  conf.level = 0.95,
  theme = NULL
)

Arguments

fs_results

List. A list containing ForestSearch analysis results with elements:

fs.est: ForestSearch estimation object from forestsearch
fs_bc: Bootstrap bias-corrected results from forestsearch_bootstrap_dofuture
fs_kfold: K-fold cross-validation results from forestsearch_Kfold or forestsearch_tenfold (optional)
fs_OOB: Out-of-bag cross-validation results (optional, alternative to fs_kfold)

df_analysis

Data frame. The analysis dataset with outcome, event, and treatment variables.

subgroup_list

List. Named list of subgroup definitions to include in the plot. Each element should be a list with:

subset_expr: Character string for subsetting (e.g., "BM> 1")
name: Display name for the subgroup
type: Either "reference" for pre-specified or "posthoc" for identified

outcome.name

Character. Name of the survival time variable.

event.name

Character. Name of the event indicator variable.

treat.name

Character. Name of the treatment variable.

E.name

Character. Label for experimental arm (default: "Experimental").

C.name

Character. Label for control arm (default: "Control").

est.scale

Character. Estimate scale: "hr" or "1/hr" (default: "hr").

xlog

Logical. If TRUE (default), the x-axis is plotted on a logarithmic scale. This is standard for hazard ratio forest plots where equal distances represent equal relative effects.

title_text

Character. Plot title (default: NULL).

arrow_text

Character vector of length 2. Arrow labels for forest plot (default: c("Favors Experimental", "Favors Control")).

footnote_text

Character vector. Footnote text for the plot explaining CV metrics (default provides CV interpretation guidance; set to NULL to omit).

xlim

Numeric vector of length 2. X-axis limits (default: c(0.25, 1.5)).

ticks_at

Numeric vector. X-axis tick positions (default: c(0.25, 0.70, 1.0, 1.5)).

show_cv_metrics

Logical. Whether to show cross-validation metrics (default: TRUE if fs_kfold or fs_OOB available).

cv_source

Character. Source for CV metrics: "auto" (default, uses both if available, otherwise whichever is present), "kfold" (use fs_kfold only), "oob" (use fs_OOB only), or "both" (explicitly use both fs_kfold and fs_OOB, with K-fold first then OOB).

posthoc_colors

Character vector. Colors for post-hoc subgroup rows (default: c("powderblue", "beige")).

reference_colors

Character vector. Colors for reference subgroup rows (default: c("yellow", "powderblue")).

ci_column_spaces

Integer. Number of spaces for the CI plot column width. More spaces = wider CI column (default: 20).

conf.level

Numeric. Confidence level for intervals (default: 0.95 for 95% CI). Used to calculate the z-multiplier as qnorm(1 - (1 - conf.level)/2).

theme

An fs_forest_theme object from create_forest_theme(). Use this to control plot sizing (fonts, row height, CI appearance, CV annotation font size). Default: NULL (uses default theme).

Details

Creates a publication-ready forest plot displaying identified subgroups with hazard ratios, bias-corrected estimates, and cross-validation metrics. This wrapper integrates ForestSearch results with the forestploter package.

ForestSearch Labeling Convention

ForestSearch identifies subgroups based on hazard ratio thresholds:

sg.harm: Contains the definition of the "harm" or "questionable" subgroup (H)
treat.recommend == 0: Patient is IN the harm subgroup (H)
treat.recommend == 1: Patient is in the COMPLEMENT subgroup (Hc, typically benefit)

For est.scale = "hr" (searching for harm):

H (treat.recommend=0): Subgroup defined by sg.harm with elevated HR (harm/questionable)
Hc (treat.recommend=1): Complement of sg.harm (potential benefit)

For est.scale = "1/hr" (searching for benefit):

Roles are reversed: H becomes the benefit group

Value

A list containing:

plot: The forestploter grob object (can be rendered with plot())
data: The data frame used for the forest plot
row_types: Character vector of row types for styling reference
cv_metrics: Cross-validation metrics text (if available)

Prepare Censoring Model Parameters

Description

Constructs the censoring model object and appends per-subject counterfactual censoring linear predictors (lin_pred_cens_0, lin_pred_cens_1) to the super-population data frame.

Usage

prepare_censoring_model(
  df_work,
  cens_type,
  cens_params,
  df_super,
  select_censoring = TRUE,
  verbose = TRUE
)

Arguments

df_work

Working data frame (output of prepare_working_dataset).

cens_type

Character. "weibull" or "uniform".

cens_params

Named list of user-supplied censoring parameters.

df_super

Super-population data frame; receives lin_pred_cens_0 and lin_pred_cens_1 columns.

select_censoring

Logical. If TRUE (default), fits the censoring distribution from observed data using AIC-based survreg model comparison. If FALSE, uses cens_params directly with no model fitting. See generate_aft_dgm_flex for the required cens_params structure under each combination of select_censoring and cens_type.

verbose

Logical. If TRUE (default), prints the censoring model comparison table and recommendation. Set to FALSE to suppress all censoring model selection output.

Details

Linear predictor convention

lin_pred_cens_0 and lin_pred_cens_1 store the covariate contribution only — i.e. \gamma_c' X, with the intercept \mu_c excluded. This matches the convention used for the outcome model (lin_pred_0, lin_pred_1 = \gamma' X, no intercept) computed in calculate_linear_predictors().

simulate_from_dgm() reconstructs the full log-censoring time as:

\log C = \mu_c + \delta + \tau_c \epsilon + \gamma_c' X

where \mu_c = params$censoring$mu, \delta = cens_adjust, \tau_c = params$censoring$tau, and \gamma_c' X = lin_pred_cens_{0|1}.

When select_censoring = TRUE, predict(survreg, type = "linear") returns the full linear predictor \mu_c + \gamma_c' X. The stored intercept \mu_c is therefore subtracted before writing lin_pred_cens_*, so that simulate_from_dgm() can add params$censoring$mu exactly once. Omitting this subtraction causes \mu_c to be counted twice, producing astronomically large censoring times and universal censoring.

When select_censoring = FALSE with a Weibull/lognormal cens_type, the intercept-only model has zero covariate contribution, so lin_pred_cens_0 = lin_pred_cens_1 = 0. Storing mu instead of 0 causes the same double-counting.

Value

A named list:

cens_model: List of censoring distribution parameters stored in dgm$model_params$censoring.
df_super: Updated super-population data frame with lin_pred_cens_0 and lin_pred_cens_1 appended. These hold covariate contributions only (\gamma_c' X); the intercept is excluded.

Prepare Data for Subgroup Search

Description

Cleans data by removing missing values and extracting components

Usage

prepare_search_data(Y, Event, Treat, Z)

Prepare subgroup data for analysis

Description

Splits a data frame into two subgroups based on a flag and treatment scale.

Usage

prepare_subgroup_data(df, SG_flag, est.scale, treat.name)

Arguments

df

Data frame.

SG_flag

Character. Name of subgroup flag variable.

est.scale

Character. Effect scale ("hr" or "1/hr").

treat.name

Character. Name of treatment variable.

Value

List with subgroup data frames and treatment variable name.

Prepare Working Dataset with Processed Covariates

Description

Prepare Working Dataset with Processed Covariates

Usage

prepare_working_dataset(
  data,
  outcome_var,
  event_var,
  treatment_var,
  continuous_vars,
  factor_vars,
  standardize,
  continuous_vars_cens,
  factor_vars_cens,
  verbose
)

Print method for cox_ahr_cde objects

Description

Print method for cox_ahr_cde objects

Usage

## S3 method for class 'cox_ahr_cde'
print(x, ...)

Arguments

x

A cox_ahr_cde object from cox_ahr_cde_analysis.

...

Additional arguments (not used).

Value

Invisibly returns the input object.

Print Method for forestsearch Objects

Description

Displays a concise summary of ForestSearch results including the identified subgroup definition, consistency metrics, algorithm details, and computation time.

Usage

## S3 method for class 'forestsearch'
print(x, ...)

Arguments

x

A forestsearch object returned by forestsearch.

...

Additional arguments (currently unused).

Value

Invisibly returns x.

Print Method for ForestSearch Forest Theme

Description

Print Method for ForestSearch Forest Theme

Usage

## S3 method for class 'fs_forest_theme'
print(x, ...)

Arguments

x

An fs_forest_theme object

...

Additional arguments (ignored)

Value

Invisibly returns x.

Print Method for ForestSearch Forest Plot

Description

Print Method for ForestSearch Forest Plot

Usage

## S3 method for class 'fs_forestplot'
print(x, ...)

Arguments

x

An fs_forestplot object

...

Additional arguments (ignored)

Value

Invisibly returns x.

Print Method for K-Fold CV Results

Description

Print Method for K-Fold CV Results

Usage

## S3 method for class 'fs_kfold'
print(x, ...)

Arguments

x

An fs_kfold object

...

Additional arguments (ignored)

Value

Invisibly returns x.

Print Method for fs_sg_plot Objects

Description

Print Method for fs_sg_plot Objects

Usage

## S3 method for class 'fs_sg_plot'
print(x, ...)

Arguments

x

An fs_sg_plot object

...

Additional arguments (unused)

Value

Invisibly returns x.

Print Method for Repeated K-Fold CV Results

Description

Print Method for Repeated K-Fold CV Results

Usage

## S3 method for class 'fs_tenfold'
print(x, ...)

Arguments

x

An fs_tenfold object

...

Additional arguments (ignored)

Value

Invisibly returns x.

Print Method for fs_weighted_km Objects

Description

Print Method for fs_weighted_km Objects

Usage

## S3 method for class 'fs_weighted_km'
print(x, ...)

Arguments

x

An fs_weighted_km object from plot_sg_weighted_km()

...

Additional arguments (unused)

Value

Invisibly returns x.

Print Method for gbsg_dgm Objects

Description

Print Method for gbsg_dgm Objects

Usage

## S3 method for class 'gbsg_dgm'
print(x, ...)

Arguments

x

A gbsg_dgm object

...

Additional arguments (unused)

Value

Invisibly returns x.

Examples


dgm <- setup_gbsg_dgm(model = "alt", verbose = FALSE)
print(dgm)

Print method for survreg_comparison objects

Description

Print method for survreg_comparison objects

Usage

## S3 method for class 'multi_survreg_comparison'
print(x, ...)

Arguments

x

A survreg_comparison object

...

Additional arguments (not used)

Value

Invisibly returns the input object

Print CV ForestSearch Parameters

Description

Print CV ForestSearch Parameters

Usage

print_cv_params(cv_args)

Print detailed output for debugging

Description

Displays detailed information about the GRF analysis

Usage

print_grf_details(config, values, best_subgroup, sg_harm_id, tree_cuts = NULL)

Arguments

config

List. GRF configuration

values

Data frame. Node metrics

best_subgroup

Data frame row. Selected subgroup (or NULL)

sg_harm_id

Character. Subgroup definition (or NULL)

tree_cuts

List. Cut information

Value

No return value, called for side effects (prints GRF diagnostic information to the console).

Process forced cut expression for a variable

Description

Evaluates a cut expression (e.g., "age <= mean(age)") and returns the expression with the value.

Usage

process_conf_force_expr(expr, df)

Arguments

expr

Character string of the cut expression.

df

Returns the 75th percentile of a numeric vector.

Usage

qhigh(x)

Arguments

x

A numeric vector.

Value

Numeric value of the 75th percentile.

25th Percentile (Quantile Low)

Description

Returns the 25th percentile of a numeric vector.

Usage

qlow(x)

Arguments

x

A numeric vector.

Value

Numeric value of the 25th percentile.

Quick Plot KM Bands from ForestSearch

Description

Convenience wrapper with sensible defaults for quick visualization.

Usage

quick_km_band_plot(df, fs.est, outcome.name, event.name, treat.name, ...)

Arguments

df

Data frame with analysis data.

fs.est

ForestSearch result object.

outcome.name

Character. Time-to-event column name.

event.name

Character. Event indicator column name.

treat.name

Character. Treatment column name.

...

Additional arguments passed to plot_km_band_forestsearch(), such as ref_subgroups, tau_add, by_risk, etc.

Value

Invisibly returns the plot result.

Remove Near-Duplicate Subgroups

Description

Removes subgroups with nearly identical statistics (HR, n, E, etc.) to reduce redundancy in candidate list.

Usage

remove_near_duplicate_subgroups(
  hr_subgroups,
  tolerance = 0.001,
  details = FALSE
)

Arguments

hr_subgroups

Data.table of subgroup results.

tolerance

Numeric. Tolerance for numeric comparison (default 0.001).

details

Logical. Print details about removed duplicates.

Value

Data.table with near-duplicate rows removed.

Remove Redundant Subgroups

Description

Removes redundant subgroups by checking for exact ties in key columns.

Usage

remove_redundant_subgroups(found.hrs)

Arguments

found.hrs

Data.table of found subgroups.

Value

Data.table of non-redundant subgroups.

Render ForestSearch Forest Plot

Description

Renders a forest plot from plot_subgroup_results_forestplot().

Usage

render_forestplot(x, newpage = TRUE)

Arguments

x

An fs_forestplot object from plot_subgroup_results_forestplot().

newpage

Logical. Call grid.newpage() before drawing. Default: TRUE.

Details

To control plot sizing, create a custom theme using create_forest_theme() and pass it to plot_subgroup_results_forestplot():

my_theme <- create_forest_theme(base_size = 14, row_padding = c(6, 4))

result <- plot_subgroup_results_forestplot(..., theme = my_theme)

render_forestplot(result)

Value

Invisibly returns the grob object.

Render Reference Simulation Table as gt

Description

Converts a data frame of pre-computed reference simulation results (e.g., digitized from a published LaTeX table) into a styled gt table. This is useful for displaying published benchmark results alongside new simulation output within vignettes or reports.

Usage

render_reference_table(
  ref_df,
  title = "Reference Simulation Results",
  subtitle = NULL,
  bold_threshold = 0.05
)

Arguments

ref_df

data.frame. Must contain a Metric column and one column per analysis method, with a Scenario column for row grouping.

title

Character. Table title.

subtitle

Character. Table subtitle. Default: NULL.

bold_threshold

Numeric. Values in any(H) rows exceeding this threshold are shown in bold. Set NULL to disable. Default: 0.05.

Value

A gt table object.

Examples


ref <- data.frame(
  Scenario = "M1 Null: N=700",
  Metric   = "any(H)",
  FS       = 0.02,
  FSlg     = 0.03,
  GRF      = 0.25
)
render_reference_table(ref, title = "Reference Results")

Resolve parallel processing arguments for bootstrap

Description

If parallel_args not provided, falls back to forestsearch call's parallel configuration. Always reports configuration to user.

Usage

resolve_bootstrap_parallel_args(parallel_args, forestsearch_call_args)

Arguments

parallel_args

List or empty list

forestsearch_call_args

List from original forestsearch call

Value

List with plan, workers, show_message

Resolve Parallel Arguments for Cross-Validation

Description

Helper function to resolve and validate parallel processing arguments, similar to bootstrap's resolve_bootstrap_parallel_args.

Usage

resolve_cv_parallel_args(parallel_args, fs_args, details = FALSE)

Arguments

parallel_args

List. User-provided parallel arguments.

fs_args

List. Original ForestSearch call arguments.

details

Logical. Print configuration messages.

Value

List with resolved plan, workers, show_message.

RMST calculation for subgroup

Description

Calculates restricted mean survival time (RMST) for a subgroup.

Usage

rmst_calculation(
  df,
  tte.name = "tte",
  event.name = "event",
  treat.name = "treat"
)

Arguments

df

Data frame.

tte.name

Character. Name of time-to-event variable.

event.name

Character. Name of event indicator variable.

treat.name

Character. Name of treatment variable.

Value

List with tau, RMST, RMST for treatment, RMST for control.

Run ForestSearch Analysis

Description

Helper function to run ForestSearch and extract estimates. Aligned with forestsearch() parameters including use_twostage.

Usage

run_forestsearch_analysis(
  data,
  confounders_name,
  params,
  dgm,
  cox_formula = NULL,
  cox_formula_adj = NULL,
  analysis_label = "FS",
  verbose = FALSE
)

Arguments

data

Data frame with simulated trial data

confounders_name

Character vector of confounder names

params

List of ForestSearch parameters

dgm

DGM object for computing true HRs

cox_formula

Cox formula for estimation

cox_formula_adj

Adjusted Cox formula

analysis_label

Character label for this analysis

verbose

Print details

Value

data.table with analysis estimates

Run GRF Analysis

Description

Helper function to run standalone GRF analysis using grf.subg.harm.survival().

Usage

run_grf_analysis(
  data,
  confounders_name,
  params,
  dgm,
  cox_formula = NULL,
  cox_formula_adj = NULL,
  analysis_label = "GRF",
  verbose = FALSE,
  debug = FALSE
)

Arguments

data

Data frame with simulated trial data

confounders_name

Character vector of confounder names

params

List of GRF parameters (from grf_merged)

dgm

DGM object for computing true HRs

cox_formula

Cox formula for estimation

cox_formula_adj

Adjusted Cox formula

analysis_label

Character label for this analysis

verbose

Print details

debug

Print detailed debugging information

Value

data.table with analysis estimates

Run One Simulation Replicate

Description

General replacement for the legacy run_simulation_analysis() that was coupled to simulate_from_gbsg_dgm() and GBSG-specific column names. This version calls simulate_from_dgm and accepts explicit column-name parameters, making it applicable to any DGM built with generate_aft_dgm_flex.

Usage

run_simulation_analysis(
  sim_id,
  dgm,
  n_sample,
  analysis_time = Inf,
  cens_adjust = 0,
  max_follow = NULL,
  muC_adj = NULL,
  confounders_base = c("v1", "v2", "v3", "v4", "v5", "v6", "v7"),
  n_add_noise = 0L,
  outcome_name = "y_sim",
  event_name = "event_sim",
  treat_name = "treat_sim",
  harm_col = "flag_harm",
  run_fs = TRUE,
  run_fs_grf = TRUE,
  run_grf = TRUE,
  fs_params = list(),
  grf_params = list(),
  cox_formula = NULL,
  cox_formula_adj = NULL,
  n_sims_total = NULL,
  seed_base = 8316951L,
  verbose = FALSE,
  verbose_n = NULL,
  debug = FALSE
)

Arguments

sim_id

Integer. Simulation replicate index (used as seed offset).

dgm

An "aft_dgm_flex" object from generate_aft_dgm_flex or setup_gbsg_dgm.

n_sample

Integer. Per-replicate sample size.

analysis_time

Numeric. Calendar time of analysis on the DGM time scale. Use Inf (default) for no administrative censoring — equivalent to the legacy max_follow = Inf.

cens_adjust

Numeric. Log-scale shift to censoring times passed to simulate_from_dgm(cens_adjust = ...). Replaces legacy muC_adj. Default 0.

max_follow

Deprecated. Use analysis_time instead. If supplied, its value is forwarded to analysis_time with a warning. Retained for backward compatibility with legacy scripts.

muC_adj

Deprecated. Use cens_adjust instead. If supplied, its value is forwarded to cens_adjust with a warning. Retained for backward compatibility with legacy scripts.

confounders_base

Character vector of base confounder names.

n_add_noise

Integer. Number of independent N(0,1) noise variables to append. Default 0L.

outcome_name

Name of the observed time column in simulated data. Default "y_sim".

event_name

Name of the event indicator column. Default "event_sim".

treat_name

Name of the treatment column. Default "treat_sim".

harm_col

Name of the true-subgroup indicator column. Default "flag_harm".

run_fs

Logical. Run ForestSearch (LASSO). Default TRUE.

run_fs_grf

Logical. Run ForestSearch (LASSO + GRF). Default TRUE.

run_grf

Logical. Run standalone GRF. Default TRUE.

fs_params

Named list of ForestSearch parameter overrides.

grf_params

Named list of GRF parameter overrides.

cox_formula

Optional Cox formula for unadjusted ITT.

cox_formula_adj

Optional adjusted Cox formula.

n_sims_total

Integer. Total simulations (for progress messages).

seed_base

Integer. Base seed; replicate seed = seed_base + sim_id. Default 8316951L.

verbose

Logical. Print progress messages. Default FALSE.

verbose_n

Integer. If set, only print for sim_id <= verbose_n. Default NULL.

debug

Logical. Print detailed debug output. Default FALSE.

Value

A data.table with one row per analysis method, containing subgroup size, HR, AHR, CDE, and classification metrics.

Run Single Consistency Split

Description

Performs one random 50/50 split and evaluates whether both halves meet the HR consistency threshold.

Usage

run_single_consistency_split(df.x, N.x, hr.consistency, cox_init = 0)

Arguments

df.x

data.table. Subgroup data with columns Y, Event, Treat.

N.x

Integer. Number of observations in subgroup.

hr.consistency

Numeric. Minimum HR threshold for consistency.

cox_init

Numeric. Initial value for Cox model (log HR).

Value

Numeric. 1 if both splits meet threshold, 0 if not, NA if error.

Evaluate an expression string in a data-frame scope

Description

Parses and evaluates expr in a restricted environment containing only the columns of df (parent: baseenv()). This isolates evaluation from the global environment, reducing scope for unintended side effects.

Usage

safe_eval_expr(df, expr)

Arguments

df

Data frame providing column names as variables.

expr

Character. Expression to evaluate (e.g., "BM > 1 & tmrsize > 19").

Value

Result of evaluating expr, or NULL on failure.

Note

eval(parse()) is used intentionally here. evaluate_comparison handles only single comparisons (e.g., "er <= 0"); this function is needed for the compound logical expressions produced by the ForestSearch subgroup enumeration algorithm (e.g., "er <= 0 & nodes > 3"). Evaluation is sandboxed: the environment contains only the columns of df with baseenv() as parent, so neither the global environment nor any package namespace is in scope. No user-supplied strings are evaluated; only internally-constructed subgroup definition strings reach this function.

Subset a data frame using an expression string

Description

Thin wrapper around safe_eval_expr that uses the logical result to subset rows.

Usage

safe_subset(df, expr)

Arguments

df

Data frame.

expr

Character. Subset expression (e.g., "BM > 1 & tmrsize > 19").

Value

Subset of df, or NULL on failure.

Save ForestSearch Forest Plot to File

Description

Saves a forest plot to a file (PDF, PNG, etc.) with explicit dimensions.

Usage

save_forestplot(x, filename, width = 12, height = 10, dpi = 300, bg = "white")

Arguments

x

An fs_forestplot object.

filename

Character. Output filename. Extension determines format.

width

Numeric. Plot width in inches. Default: 12.

height

Numeric. Plot height in inches. Default: 10.

dpi

Numeric. Resolution for raster formats. Default: 300.

bg

Character. Background color. Default: "white".

Value

Invisibly returns the filename.

Select best subgroup based on criterion

Description

Identifies the optimal subgroup according to the specified criterion

Usage

select_best_subgroup(values, sg.criterion, dmin.grf, n.max)

Arguments

values

Data frame. Node metrics from policy trees

sg.criterion

Character. "mDiff" for maximum difference, "Nsg" for largest size

dmin.grf

Numeric. Minimum difference threshold

n.max

Integer. Maximum allowed subgroup size (total sample size)

Value

Data frame row with best subgroup or NULL if none found

Examples

vals <- data.frame(diff = c(8.5, 6.2, 3.1), Nsg = c(120, 95, 80))
select_best_subgroup(values = vals, sg.criterion = "mDiff",
                     dmin.grf = 6, n.max = 500)

Generate Cross-Validation Sensitivity Text

Description

Creates formatted text summarizing cross-validation agreement metrics.

Usage

sens_text(fs_kfold, est.scale = "hr")

Arguments

fs_kfold

K-fold cross-validation results from forestsearch_Kfold.

est.scale

Character. "hr" or "1/hr".

Value

Character string with formatted CV metrics.

Sensitivity Analysis of Hazard Ratios to k_inter

Description

Analyzes how the interaction parameter k_inter affects hazard ratios in different populations (overall, harm subgroup, no-harm subgroup).

Usage

sensitivity_analysis_k_inter(
  k_inter_range = c(-5, 5),
  n_points = 21,
  plot = TRUE,
  ...
)

Arguments

k_inter_range

Numeric vector of length 2 specifying the range of k_inter values to analyze. Default is c(-5, 5).

n_points

Integer number of points to evaluate within the range. Default is 21.

plot

Logical indicating whether to create visualization plots. Default is TRUE.

...

Additional arguments passed to generate_aft_dgm_flex.

Details

This function evaluates the hazard ratios at evenly spaced points across the k_inter range. If plot = TRUE, it creates a 4-panel visualization showing:

Harm subgroup HR vs k_inter
All HRs (overall, harm, no-harm) vs k_inter
Ratio of HRs (harm/no-harm) showing effect modification
Table of key values

Value

A data.frame of class "k_inter_sensitivity" with columns:

k_inter: Numeric k_inter value
hr_harm: Numeric hazard ratio in harm subgroup
hr_no_harm: Numeric hazard ratio in no-harm subgroup
hr_overall: Numeric overall hazard ratio
subgroup_size: Integer size of harm subgroup

Set Up a GBSG-Based AFT Data Generating Mechanism

Description

Creates a GBSG-based data generating mechanism that is fully compatible with simulate_from_dgm. This is the replacement for create_gbsg_dgm(): it accepts exactly the same arguments and produces the same numeric output, but returns an object of class "aft_dgm_flex" instead of "gbsg_dgm".

Usage

setup_gbsg_dgm(
  model = c("alt", "null"),
  k_treat = 1,
  k_inter = 1,
  k_z3 = 1,
  z1_quantile = 0.25,
  n_super = 5000L,
  cens_type = c("weibull", "uniform"),
  use_rand_params = FALSE,
  seed = 8316951L,
  verbose = FALSE
)

Arguments

model

Character. Either "alt" for alternative hypothesis with heterogeneous treatment effects, or "null" for uniform treatment effect. Default: "alt"

k_treat

Numeric. Treatment effect multiplier applied to the treatment coefficient from the fitted AFT model. Values > 1 strengthen the treatment effect. Default: 1

k_inter

Numeric. Interaction effect multiplier for the treatment-subgroup interaction (z1 * z3). Only used when model = "alt". Higher values create more heterogeneity between HR(H) and HR(Hc). Default: 1

k_z3

Numeric. Effect multiplier for the z3 (menopausal status) coefficient. Default: 1

z1_quantile

Numeric. Quantile threshold for z1 (estrogen receptor). Observations with ER <= quantile are coded as z1 = 1. Default: 0.25

n_super

Integer. Size of super-population for empirical HR estimation. Default: 5000

cens_type

Character. Censoring distribution type: "weibull" or "uniform". Default: "weibull"

use_rand_params

Logical. If TRUE, modifies confounder coefficients using estimates from randomized subset (meno == 0). Default: FALSE

seed

Integer. Random seed for super-population generation. Default: 8316951

verbose

Logical. Print diagnostic information. Default: FALSE

Details

Internally the function calls create_gbsg_dgm() and then:

Adds a df_super field with column names aligned to simulate_from_dgm() conventions (lin_pred_1, lin_pred_0, lin_pred_cens_1, lin_pred_cens_0, flag_harm).
Adds a model_params$tau field (= model_params$sigma) and a model_params$censoring sub-list.
Sets class to c("aft_dgm_flex", "gbsg_dgm", "list").

The original df_super_rand field is kept so that compute_dgm_cde() and print.gbsg_dgm continue to work.

Value

An object of class c("aft_dgm_flex", "gbsg_dgm", "list") with all fields from create_gbsg_dgm() plus:

df_super: Super-population data frame with simulate_from_dgm()-compatible column names.
model_params$tau: Copy of model_params$sigma.
model_params$censoring: Sub-list with type, mu, tau for the censoring model.

Examples


dgm <- setup_gbsg_dgm(model = "alt", k_inter = 2, verbose = FALSE)
dgm <- compute_dgm_cde(dgm)
print(dgm)
sim <- simulate_from_dgm(dgm, n = 400, seed = 1)

Set up parallel processing for subgroup consistency

Description

Sets up parallel processing using the specified approach and number of workers.

Usage

setup_parallel_SGcons(
  parallel_args = list(plan = "multisession", workers = 4, show_message = TRUE)
)

Arguments

parallel_args

List with plan (character), workers (integer), and show_message (logical).

Value

None. Sets up parallel backend as side effect.

Output Subgroup Consistency Results

Description

Returns the top subgroup(s) and recommended treatment flags.

Usage

sg_consistency_out(
  df,
  result_new,
  sg_focus,
  index.Z,
  names.Z,
  details = FALSE,
  plot.sg = FALSE,
  by.risk = 12,
  confs_labels
)

Arguments

df

Data.frame. Original analysis data.

result_new

Data.table. Sorted subgroup results.

sg_focus

Character. Sorting focus criterion.

index.Z

Matrix. Subgroup factor indicators.

names.Z

Character vector. Factor column names.

details

Logical. Print details.

plot.sg

Logical. Plot subgroup curves.

by.risk

Numeric. Risk interval for plotting.

confs_labels

Character vector. Human-readable labels.

Value

List with results, subgroup definition, labels, flags, and group id.

Enhanced Subgroup Summary Tables (gt output)

Description

Returns formatted summary tables for subgroups using the gt package, with search metadata and customizable decimal precision. Produces two tables: a treatment effect estimates table and an identified subgroups table, each with fully customizable titles and subtitles.

Usage

sg_tables(
  fs,
  which_df = "est",
  est_title = "Treatment Effect Estimates",
  est_caption = "Training data estimates",
  sg_title = "Identified Subgroups",
  sg_subtitle = NULL,
  potentialOutcome.name = NULL,
  hr_1a = NA,
  hr_0a = NA,
  ndecimals = 3,
  include_search_info = TRUE,
  font_size = 12
)

Arguments

fs

ForestSearch results object.

which_df

Character. Which data frame to use ("est" or "testing").

est_title

Character or NULL. Main title for the estimates table (default: "Treatment Effect Estimates"). Rendered as bold markdown. Set to NULL to suppress the title and display only est_caption.

est_caption

Character. Subtitle for the estimates table (default: "Training data estimates").

sg_title

Character or NULL. Main title for the identified subgroups table (default: "Identified Subgroups"). Rendered as bold markdown. Set to NULL to suppress the title and display only sg_subtitle.

sg_subtitle

Character or NULL. Subtitle for the identified subgroups table. When NULL (default), an informative subtitle is auto-generated from maxk (e.g., "Two-factor subgroups (maxk=2)").

potentialOutcome.name

Character. Name of potential outcome variable (optional).

hr_1a

Character. Adjusted HR for subgroup 1 (optional).

hr_0a

Character. Adjusted HR for subgroup 0 (optional).

ndecimals

Integer. Number of decimals for formatted numbers (default: 3).

include_search_info

Logical. Include search metadata table (default: TRUE).

font_size

Numeric. Font size in pixels for table text (default: 12).

Value

List with gt tables for estimates, subgroups, and optionally search info.

Simulate Survival Data from AFT Data Generating Mechanism

Description

Generates simulated survival data from a previously created AFT data generating mechanism (DGM). Samples from the super population and generates survival times with specified censoring.

Usage

simulate_from_dgm(
  dgm,
  n = NULL,
  rand_ratio = 1,
  entry_var = NULL,
  max_entry = 24,
  analysis_time = 48,
  cens_adjust = 0,
  draw_treatment = TRUE,
  seed = NULL,
  strata_rand = NULL,
  hrz_crit = NULL,
  keep_rand = FALSE,
  time_eos = NULL
)

Arguments

dgm

An object of class "aft_dgm_flex" created by generate_aft_dgm_flex.

n

Integer specifying the sample size. If NULL (default), uses the entire super population without sampling.

rand_ratio

Numeric randomisation ratio (treatment:control). Default 1 (1:1 allocation).

entry_var

Character string naming an entry-time variable in the super population. If NULL, entry times are drawn as Uniform(0, max_entry). Default NULL.

max_entry

Numeric maximum entry time for staggered entry simulation. Only used when entry_var = NULL. Default 24.

analysis_time

Numeric calendar time of analysis. Follow-up is analysis_time - entry_time. Must be on the same time scale as the DGM (i.e. the same units as outcome_var passed to generate_aft_dgm_flex). Default 48.

cens_adjust

Numeric log-scale adjustment to censoring distribution. Positive values increase censoring times; negative values decrease them. Default 0 (no adjustment).

draw_treatment

Logical. If TRUE (default), reassigns treatment according to rand_ratio. If FALSE, retains original treatment assignments from the super population.

seed

Integer random seed. Default NULL.

strata_rand

Character string naming a column in the sampled data for within-stratum balanced treatment allocation. If NULL, marginal allocation is used. Default NULL.

hrz_crit

Numeric log-HR threshold. If supplied, a column hrz_flag is added marking subjects with lin_pred_1 - lin_pred_0 >= hrz_crit. Default NULL.

keep_rand

Logical. If TRUE, appends a rand_order column preserving the randomisation sequence. Default FALSE.

time_eos

Numeric secondary administrative censoring cutoff (end-of-study time on the DGM scale). Applied after follow_up censoring. Default NULL.

Details

Time-scale consistency

All time parameters (analysis_time, max_entry, time_eos) must be expressed in the same units as outcome_var supplied to generate_aft_dgm_flex(). A common error is building the DGM on days (e.g. rfstime) and then passing analysis_time in months, which causes follow-up windows far shorter than the DGM event-time scale and produces universal administrative censoring (event_sim = 0 for all subjects).

Verify with: exp(dgm$model_params$mu) — the implied median event time should be plausible given your analysis_time.

n = NULL path

When n = NULL the entire super population is used as-is, with no staggered entry and no administrative censoring (follow_up = Inf). Treatment assignments and linear predictors already stored in dgm$df_super are retained unchanged.

Censoring adjustment

cens_adjust shifts the log-scale location parameter of the censoring distribution:

cens_adjust = log(2) doubles expected censoring times.
cens_adjust = log(0.5) halves expected censoring times.

Value

A data.frame with columns:

id: Subject identifier.
treat: Original treatment from super population.
treat_sim: Simulated treatment assignment.
flag_harm: Subgroup indicator (1 = all subgroup conditions met).
z_*: Covariate values.
lin_pred_1, lin_pred_0: Counterfactual log-time linear predictors.
y_sim: Observed survival time (min(T, C)).
event_sim: Event indicator (1 = event, 0 = censored).
t_true: Latent true survival time (pre-censoring).
c_time: Effective censoring time (post admin-censoring).
hrz_flag: (Optional) Individual harm-zone indicator.
rand_order: (Optional) Randomisation sequence index.

Examples


dgm <- setup_gbsg_dgm(model = "null", verbose = FALSE)
sim_data <- simulate_from_dgm(dgm, n = 200, seed = 42)
dim(sim_data)
head(sim_data[, c("y_sim", "event_sim", "treat_sim")])

Simulate Trial Data from GBSG DGM

Description

Generates simulated clinical trial data from a GBSG-based data generating mechanism.

Usage

simulate_from_gbsg_dgm(
  dgm,
  n = NULL,
  rand_ratio = 1,
  sim_id = 1,
  max_follow = Inf,
  muC_adj = 0,
  min_cens = NULL,
  max_cens = NULL,
  draw_treatment = TRUE
)

Arguments

dgm

A "gbsg_dgm" object from create_gbsg_dgm

n

Integer. Sample size. If NULL, uses full super-population. Default: NULL

rand_ratio

Numeric. Randomization ratio (treatment:control). Default: 1 (1:1 randomization)

sim_id

Integer. Simulation ID used for seed offset. Default: 1

max_follow

Numeric. Administrative censoring time (months). Default: Inf (no administrative censoring)

muC_adj

Numeric. Adjustment to censoring distribution location parameter. Positive values increase censoring. Default: 0

min_cens

Numeric. Minimum censoring time for uniform censoring. Required if cens_type = "uniform"

max_cens

Numeric. Maximum censoring time for uniform censoring. Required if cens_type = "uniform"

draw_treatment

Logical. If TRUE, randomly assigns treatment. If FALSE, samples from existing treatment arms. Default: TRUE

Value

Data frame with simulated trial data including:

id: Subject identifier
y.sim: Observed follow-up time
event.sim: Event indicator (1 = event, 0 = censored)
t.sim: True event time (before censoring)
treat: Treatment indicator
flag.harm: Harm subgroup indicator
loghr_po: Individual log hazard ratio (potential outcome)
v1-v7: Analysis factors

Sort Subgroups by Focus

Description

Sorts a data.table of subgroup results according to the specified focus.

Usage

sort_subgroups(result_new, sg_focus)

Arguments

result_new

A data.table of subgroup results.

sg_focus

Sorting focus: "hr", "hrMaxSG", "maxSG", "hrMinSG", "minSG".

Value

A sorted data.table.

Sort Subgroups by Focus at consistency stage (consistency not available at this point)

Description

Sorts a data.table of subgroup results according to the specified focus.

Usage

sort_subgroups_preview(result_new, sg_focus)

Arguments

result_new

A data.table of subgroup results.

sg_focus

Sorting focus: "hr", "hrMaxSG", "maxSG", "hrMinSG", "minSG".

Value

A sorted data.table.

Evaluate Subgroup Consistency

Description

Evaluates candidate subgroups using split-sample consistency validation. For each candidate, repeatedly splits the data and checks whether the treatment effect direction is consistent across splits.

Usage

subgroup.consistency(
  df,
  hr.subgroups,
  hr.threshold = 1,
  hr.consistency = 1,
  pconsistency.threshold = 0.9,
  m1.threshold = Inf,
  n.splits = 100,
  details = FALSE,
  by.risk = 12,
  plot.sg = FALSE,
  maxk = 7,
  Lsg,
  confs_labels,
  sg_focus = "hr",
  stop_Kgroups = 10,
  stop_threshold = NULL,
  showten_subgroups = FALSE,
  pconsistency.digits = 2,
  seed = 8316951,
  checking = FALSE,
  use_twostage = FALSE,
  twostage_args = list(),
  parallel_args = list()
)

Arguments

df

Data frame containing the analysis dataset. Must include columns for outcome (Y), event indicator (Event), and treatment (Treat).

hr.subgroups

Data.table of candidate subgroups from subgroup search, containing columns: HR, n, E, K, d0, d1, m0, m1, grp, and factor indicators.

hr.threshold

Numeric. Minimum hazard ratio threshold for candidates. Default: 1.0

hr.consistency

Numeric. Minimum HR required in each split for consistency. Default: 1.0

pconsistency.threshold

Numeric. Minimum proportion of splits that must be consistent. Default: 0.9

m1.threshold

Numeric. Maximum m1 threshold for filtering. Default: Inf

n.splits

Integer. Number of splits for consistency evaluation. Default: 100

details

Logical. Print progress details. Default: FALSE

by.risk

Numeric. Risk interval for KM plots. Default: 12

plot.sg

Logical. Generate subgroup plots. Default: FALSE

maxk

Integer. Maximum number of factors in subgroup. Default: 7

Lsg

List of subgroup parameters.

confs_labels

Character vector mapping factor names to labels.

sg_focus

Character. Subgroup selection criterion: "hr", "maxSG", or "minSG". Default: "hr"

stop_Kgroups

Integer. Maximum number of candidates to evaluate. Default: 10

stop_threshold

Numeric in [0, 1] or NULL. When a candidate subgroup's consistency probability (Pcons) meets or exceeds this threshold, evaluation stops early — remaining candidates are skipped. Set to NULL to disable early stopping and evaluate all candidates up to stop_Kgroups. Default: NULL.

Note: Values > 1.0 are not permitted. To disable early stopping, use stop_threshold = NULL, not a value above 1.

Interaction with sg_focus:

"hr", "maxSG", "minSG": Early stopping is valid because candidates are sorted by a single criterion. The first candidate passing the threshold is optimal under that criterion.
"hrMaxSG", "hrMinSG": Should generally be NULL, because these compound criteria require comparing HR and size across all candidates. forestsearch() automatically resets to NULL with a warning for these.

For parallel execution, early stopping is checked after each batch completes, so some additional candidates beyond the first meeting the threshold may be evaluated. Use a smaller batch_size in parallel_args for finer-grained early stopping.

showten_subgroups

Logical. If TRUE, prints up to 10 candidate subgroups after sorting by sg_focus, showing their rank, HR, sample size, events, and factor definitions. Useful for reviewing which candidates will be evaluated for consistency. Default: FALSE

pconsistency.digits

Integer. Decimal places for consistency proportion. Default: 2

seed

Integer. Random seed for reproducible consistency splits. Default: 8316951. Set to NULL for non-reproducible random splits. The seed is used both for sequential execution (via set.seed()) and parallel execution (via future.seed).

checking

Logical. Enable additional validation checks. Default: FALSE

use_twostage

Logical. Use two-stage adaptive algorithm. Default: FALSE

twostage_args

List. Parameters for two-stage algorithm:

n.splits.screen: Splits for Stage 1 screening. Default: 30
screen.threshold: Consistency threshold for Stage 1. Default: auto
batch.size: Splits per batch in Stage 2. Default: 20
conf.level: Confidence level for early stopping. Default: 0.95
min.valid.screen: Minimum valid Stage 1 splits. Default: 10

parallel_args

List. Parallel processing configuration:

plan: Future plan: "multisession", "multicore", or "sequential"
workers: Number of parallel workers
batch_size: Number of candidates to evaluate per batch. Smaller values provide finer-grained early stopping but may increase overhead. Default: When stop_threshold is set and sg_focus is "hr" or "minSG", defaults to 1 (stop immediately when first candidate passes). For other sg_focus values with stop_threshold, defaults to min(workers, n_candidates/4). When stop_threshold is NULL, defaults to workers*2 for efficiency.
show_message: Print parallel config messages

Value

A list containing:

out_sg: Selected subgroup results
sg_focus: Selection criterion used
df_flag: Data frame with treatment recommendations
sg.harm: Subgroup definition labels
sg.harm.id: Subgroup membership indicator
algorithm: "twostage" or "fixed"
n_candidates_evaluated: Number of candidates actually evaluated
n_candidates_total: Total candidates available
n_passed: Number meeting consistency threshold
early_stop_triggered: Logical indicating if early stop occurred
early_stop_candidate: Index of candidate triggering early stop
stop_threshold: Threshold used for early stopping
seed: Random seed used for reproducibility (NULL if not set)

Subgroup Search for Treatment Effect Heterogeneity (Improved, Parallelized)

Description

Searches for subgroups with treatment effect heterogeneity using combinations of candidate factors. Evaluates subgroups for minimum prevalence, event counts, and hazard ratio threshold. Parallelizes the main search loop.

Usage

subgroup.search(
  Y,
  Event,
  Treat,
  ID = NULL,
  Z,
  n.min = 30,
  d0.min = 15,
  d1.min = 15,
  hr.threshold = 1,
  max.minutes = 30,
  minp = 0.05,
  rmin = 5,
  details = FALSE,
  maxk = 2,
  parallel_workers = parallel::detectCores()
)

Arguments

Y

Numeric vector of outcome (e.g., time-to-event).

Event

Numeric vector of event indicators (0/1).

Treat

Numeric vector of treatment group indicators (0/1).

ID

Optional vector of subject IDs.

Z

Matrix or data frame of candidate subgroup factors (binary indicators).

n.min

Integer. Minimum subgroup size.

d0.min

Integer. Minimum number of events in control.

d1.min

Integer. Minimum number of events in treatment.

hr.threshold

Numeric. Hazard ratio threshold for subgroup selection.

max.minutes

Numeric. Maximum minutes for search.

minp

Numeric. Minimum prevalence rate for each factor.

rmin

Integer. Minimum required reduction in sample size when adding a factor.

details

Logical. Print details during execution.

maxk

Integer. Maximum number of factors in a subgroup.

parallel_workers

Integer. Number of parallel workers (default: all available cores).

Value

List with found subgroups, maximum HR, search time, configuration info, and filtering statistics.

Summarize Bootstrap Event Counts

Description

Provides summary statistics for event counts across bootstrap iterations, helping assess the reliability of HR estimates when events are sparse.

Usage

summarize_bootstrap_events(boot_results, threshold = 5)

Arguments

boot_results

List. Output from forestsearch_bootstrap_dofuture()

threshold

Integer. Minimum event threshold for flagging low counts (default: 5)

Details

This function summarizes event counts in four scenarios:

ORIGINAL subgroup H evaluated on BOOTSTRAP samples
ORIGINAL subgroup Hc evaluated on BOOTSTRAP samples
NEW subgroup H* (found in bootstrap) evaluated on ORIGINAL data
NEW subgroup Hc* (found in bootstrap) evaluated on ORIGINAL data

Low event counts (below threshold) can lead to unstable HR estimates. This summary helps identify potential issues with sparse events.

Value

Invisibly returns a list with summary statistics:

threshold: The event threshold used
nb_boots: Total number of bootstrap iterations
n_successful: Number of iterations that found a new subgroup
original_H: List with low event counts for original H on bootstrap samples
original_Hc: List with low event counts for original Hc on bootstrap samples
new_Hstar: List with low event counts for new H* on original data
new_Hcstar: List with low event counts for new Hc* on original data

Enhanced Bootstrap Results Summary

Description

Creates comprehensive output including formatted table with subgroup footnote, diagnostic plots, bootstrap quality metrics, and detailed timing analysis.

Usage

summarize_bootstrap_results(
  sgharm,
  boot_results,
  create_plots = FALSE,
  est.scale = "hr"
)

Arguments

sgharm

The selected subgroup object from forestsearch results. Can be:

Character vector of factor definitions (e.g., c("{age>=50}", "{nodes>=3}"))
List with sgharm element containing factor definitions
List with sg.harm_label element (human-readable labels)

boot_results

List. Output from forestsearch_bootstrap_dofuture()

create_plots

Logical. Generate diagnostic plots (default: FALSE)

est.scale

Character. "hr" or "1/hr" for effect scale

Details

The table output includes a footnote displaying the identified subgroup definition, analogous to the tab_estimates table from sg_tables. This is achieved by extracting the subgroup definition from sgharm and passing it to format_bootstrap_table.

Value

List with components:

table: gt table with treatment effects and subgroup footnote
diagnostics: List of bootstrap quality metrics
diagnostics_table_gt: gt table of diagnostics
plots: List of ggplot2 diagnostic plots (if create_plots=TRUE)
timing: List of timing analysis (if timing data available)
subgroup_summary: List from summarize_bootstrap_subgroups()

Summarize Bootstrap Subgroup Analysis Results

Description

Comprehensive summary of bootstrap subgroup identification results including basic statistics, factor frequencies, consistency distributions, and agreement with the original analysis subgroup.

Usage

summarize_bootstrap_subgroups(results, nb_boots, original_sg = NULL, maxk = 2)

Arguments

results

Data.table or data.frame. Bootstrap results with subgroup characteristics including columns like Pcons, hr_sg, N_sg, K_sg, and M.1-M.k

nb_boots

Integer. Total number of bootstrap iterations

original_sg

Character vector. Original subgroup definition from main analysis (e.g., c("{age>=50}", "{nodes>=3}") for a 2-factor subgroup)

maxk

Integer. Maximum number of factors allowed in subgroup definition

Value

List with summary components:

basic_stats: Data.table of summary statistics
consistency_dist: Data.table of Pcons distribution by bins
size_dist: Data.table of subgroup size distribution
factor_freq: Data.table of factor frequencies by position
agreement: Data.table of subgroup definition agreement counts
factor_presence: Data.table of base factor presence counts
factor_presence_specific: Data.table of specific factor definitions
original_agreement: Data.table comparing to original analysis subgroup
n_found: Integer. Number of successful iterations
pct_found: Numeric. Percentage of successful iterations

Summarize Factor Presence Across Bootstrap Subgroups

Description

Analyzes how often each individual factor appears in identified subgroups, extracting base factor names from full definitions and identifying common specific definitions.

Usage

summarize_factor_presence_robust(
  results,
  maxk = 2,
  threshold = 10,
  as_gt = TRUE
)

Arguments

results

Data.table or data.frame. Bootstrap results with M.1, M.2, etc. columns

maxk

Integer. Maximum number of factors allowed

threshold

Numeric. Percentage threshold for including specific definitions (default: 10)

as_gt

Logical. Return gt tables (TRUE) or data.frames (FALSE)

Value

List with base_factors and specific_factors data.frames or gt tables

Summarize Simulation Results

Description

Creates a summary table of operating characteristics across all simulations. Includes both HR and AHR metrics.

Usage

summarize_simulation_results(
  results,
  analyses = NULL,
  digits = 2,
  digits_hr = 3
)

Arguments

results

data.table with simulation results from run_simulation_analysis

analyses

Character vector. Analysis methods to include. Default: all

digits

Integer. Decimal places for proportions. Default: 2

digits_hr

Integer. Decimal places for hazard ratios. Default: 3

Value

Data frame with summary statistics

Summarize Single Analysis Results

Description

Summarize Single Analysis Results

Usage

summarize_single_analysis(result, digits = 2, digits_hr = 3)

Arguments

result

data.table with results for a single analysis method

digits

Integer. Decimal places for proportions

digits_hr

Integer. Decimal places for hazard ratios

Value

Data frame with summary statistics

Summary method for cox_ahr_cde objects

Description

Summary method for cox_ahr_cde objects

Usage

## S3 method for class 'cox_ahr_cde'
summary(object, ...)

Arguments

object

A cox_ahr_cde object from cox_ahr_cde_analysis.

...

Additional arguments (not used).

Value

Invisibly returns the input object.

Summary Method for forestsearch Objects

Description

Provides a detailed summary of a ForestSearch analysis including input parameters, variable selection results, consistency evaluation, and the selected subgroup with key metrics.

Usage

## S3 method for class 'forestsearch'
summary(object, ...)

Arguments

object

A forestsearch object returned by forestsearch.

...

Additional arguments (currently unused).

Value

Invisibly returns object.

Summary Tables for MRCT Simulation Results

Description

Creates summary tables from MRCT simulation results using the gt package. Summarizes hazard ratio estimates, subgroup identification rates, and classification of identified subgroups. Optionally displays two scenarios (e.g., alternative and null hypotheses) side by side.

Usage

summaryout_mrct(
  pop_summary = NULL,
  mrct_sims,
  mrct_sims_null = NULL,
  scenario_labels = c("Alternative", "Null"),
  pop_summary_null = NULL,
  sg_type = 1,
  tab_caption = "Identified subgroups and estimation summaries",
  digits = 3,
  trim_threshold = 1000,
  trim_fraction = 0.01,
  table_width = 600,
  font_size = 11,
  showtable = TRUE
)

Arguments

pop_summary

List. Population summary from large sample approximation (optional). Default: NULL

mrct_sims

data.table. Simulation results from mrct_region_sims (first / primary scenario).

mrct_sims_null

data.table. Optional second set of simulation results (e.g., null hypothesis). When supplied, the table displays two value columns side by side. Default: NULL (single-scenario table).

scenario_labels

Character vector of length 2. Column headers for the two scenarios. Only used when mrct_sims_null is supplied. Default: c("Alternative", "Null").

pop_summary_null

List. Population summary for the null scenario (optional). Default: NULL

sg_type

Integer. Type of subgroup summary: 1 = basic summary (found, biomarker, age); 2 = extended summary (all subgroup types). Default: 1

tab_caption

Character. Caption for the output table. Default: "Identified subgroups and estimation summaries"

digits

Integer. Number of decimal places for numeric summaries. Default: 3

trim_threshold

Numeric. When the raw mean of a metric exceeds this value in absolute terms, the summary switches to a symmetrically trimmed mean and SD (excluding the lower and upper trim_fraction of observations). Trimmed values are marked with * and a footnote is added to the table. Set to NULL to disable trimming entirely. Default: 1000.

trim_fraction

Numeric between 0 and 0.5. Fraction of observations to trim from each tail when trimming is triggered. Default: 0.01 (1 percent from each tail, i.e., the central 98 percent of values).

table_width

Numeric. Total table width in pixels. Column widths are allocated proportionally. Increase for HTML/wide displays (e.g., 750), decrease for beamer slides (e.g., 550). Default: 600.

font_size

Numeric. Base font size in pixels. Title is font_size + 2, subtitle matches base. Reduce for beamer (e.g., 9 or 10). Default: 11.

showtable

Logical. Print the table. Default: TRUE

Value

List with components:

res: List of summary statistics from population. When dual-scenario, contains res_alt and res_null.
out_table: Formatted gt table object, or data.frame if gt is unavailable.
data: Processed mrct_sims data.table with derived variables. When dual-scenario, also contains data_null.
summary_df: Data frame of computed summary statistics.

Validate input data for GRF analysis

Description

Checks that input data meets requirements for GRF analysis

Usage

validate_grf_data(W, D, n.min)

Arguments

W

Numeric vector. Treatment indicator

D

Numeric vector. Event indicator

n.min

Integer. Minimum subgroup size

Value

Logical. TRUE if data is valid, FALSE with warning otherwise

Validate Input Parameters

Description

Validate Input Parameters

Usage

validate_inputs(
  data,
  model,
  cens_type,
  outcome_var,
  event_var,
  treatment_var,
  continuous_vars,
  factor_vars
)

Validate k_inter Effect on HR Heterogeneity

Description

Test function to verify that k_inter properly modulates the difference between HR(H) and HR(Hc), and that AHR metrics align with Cox-based HRs.

Usage

validate_k_inter_effect(
  k_inter_values = c(-2, -1, 0, 1, 2, 3),
  verbose = TRUE,
  ...
)

Arguments

k_inter_values

Numeric vector of k_inter values to test. Default: c(-2, -1, 0, 1, 2, 3)

verbose

Logical. Print results. Default: TRUE

...

Additional arguments passed to create_gbsg_dgm

Value

Data frame with k_inter, hr_H, hr_Hc, AHR_H, AHR_Hc, and ratio columns

Examples


# Test k_inter effect
results <- validate_k_inter_effect()

# k_inter = 0 should give hr_H approximately equals hr_Hc (ratio approximately 1)

Validate Dataset for MRCT Simulations

Description

Checks that a dataset contains all required variables for MRCT simulation functions and reports any issues. Required variables include outcome (tte, event), treatment (treat), continuous covariates (age, bm), and factor covariates (male, histology, prior_treat, regA).

Usage

validate_mrct_data(df.case, verbose = TRUE)

Arguments

df.case

Data frame to validate

verbose

Logical. Print detailed validation results. Default: TRUE

Details

Required Variables

The function checks for the following variables:

Outcome: tte (time-to-event), event (0/1 indicator)
Treatment: treat (0/1 indicator)
Continuous: age, bm (biomarker)
Factor: male (0/1), histology, prior_treat (0/1), regA (0/1)

The function also validates variable types and value ranges.

Value

Logical. TRUE if all requirements met, FALSE otherwise (invisibly)

Validate Spline Specification

Description

Validate Spline Specification

Usage

validate_spline_spec(spline_spec, df_work)

Wilson Score Confidence Interval

Description

Computes Wilson score confidence interval for a proportion, which has better coverage properties than the normal approximation for small samples and proportions near 0 or 1.

Usage

wilson_ci(x, n, conf.level = 0.95)

Arguments

x

Integer. Number of successes.

n

Integer. Number of trials.

conf.level

Numeric. Confidence level (default 0.95).

Value

Named numeric vector with elements: estimate, lower, upper.

Package {forestsearch}

forestsearch: Exploratory Subgroup Identification in Clinical Trials with Survival Endpoints

Description

Author(s)

See Also

Cross-Validation Subgroup Match Summary

Description

Usage

Arguments

Value

Convert Factor Code to Label

Description

Usage

Arguments

Value

Subgroup summary table estimates

Description

Usage

Arguments

Value

Violin/Boxplot Visualization of HR Estimates

Description

Usage

Arguments

Value

See Also

Disjunctive (dummy) coding for factor columns

Description

Usage

Arguments

Value

Add ID Column to Data Frame

Description

Usage

Arguments

Value

Add Unprocessed Variables from Original Data

Description

Usage

Analyze subgroup for summary table (OPTIMIZED)

Description

Usage

Arguments

Value

Apply Spline Constraint to Treatment Effect Coefficients

Description

Usage

Assemble Final Results Object

Description

Usage

Assign data to subgroups based on selected node

Description

Usage

Arguments

Value

Bootstrap Results for ForestSearch with Bias Correction

Description

Usage

Arguments

Value

Bias Correction Methods

Computational Details

Bootstrap Configuration

Performance Considerations

Error Handling

Note

See Also

Bootstrap Ystar Matrix

Description

Usage

Arguments

Value

Build Classification Rate Table from Simulation Results

Description

Usage

Arguments

Details

Value

See Also

Build Cox Model Formula