--- title: "Validation Study" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Validation Study} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5 ) ``` ## Scope This vignette documents the shipped release-validation study for `SelectBoost.quantile`. The goal is not to claim universal superiority, but to show how the current prototype behaves against two direct baselines: - plain quantile lasso (`lasso`) - cross-validated quantile lasso with a 1-SE penalty rule (`lasso_tuned`) - `selectboost_quantile()` with tau-aware screening, stronger tuning, complementary-pairs stability selection, capped neighborhoods, and a hybrid support score The included benchmark artifacts were generated with: - scenarios from `default_quantile_benchmark_scenarios()` - `tau = c(0.25, 0.5, 0.75)` - 4 Monte Carlo replications per scenario - `selectboost_quantile(..., B = 8, step_num = 0.5, screen = "auto", tune_lambda = "cv", lambda_rule = "one_se", lambda_inflation = 1.25, complementary_pairs = TRUE, max_group_size = 15, nlambda = 8)` - stable support extracted with the hybrid summary score at `threshold = 0.55` ```{r} summary_path <- system.file( "extdata", "validation", "quantile_benchmark_release_summary.csv", package = "SelectBoost.quantile" ) raw_path <- system.file( "extdata", "validation", "quantile_benchmark_release_raw.csv", package = "SelectBoost.quantile" ) resolve_validation_path <- function(installed_path, filename) { if (nzchar(installed_path) && file.exists(installed_path)) { return(installed_path) } candidates <- c( file.path("inst", "extdata", "validation", filename), file.path("..", "inst", "extdata", "validation", filename) ) candidates <- candidates[file.exists(candidates)] if (!length(candidates)) { stop("Could not locate shipped validation artifact: ", filename, call. = FALSE) } candidates[[1]] } summary_path <- resolve_validation_path(summary_path, "quantile_benchmark_release_summary.csv") raw_path <- resolve_validation_path(raw_path, "quantile_benchmark_release_raw.csv") validation_summary <- utils::read.csv(summary_path, stringsAsFactors = FALSE) validation_raw <- utils::read.csv(raw_path, stringsAsFactors = FALSE) validation_summary$family <- sub("_tau_.*$", "", validation_summary$scenario) validation_summary$is_high_dim <- grepl("^high_dim", validation_summary$scenario) validation_summary$mean_f1 <- with( validation_summary, ifelse( (2 * mean_tp + mean_fp + mean_fn) > 0, 2 * mean_tp / (2 * mean_tp + mean_fp + mean_fn), NA_real_ ) ) ``` ## Overall summary The first table averages the scenario-level summaries across the full shipped grid, including the `n < p` stress regime. ```{r} overall <- aggregate( cbind(mean_tpr, mean_fdr, mean_f1, failure_rate, mean_runtime_sec) ~ method, data = validation_summary, FUN = mean ) knitr::kable(overall, digits = 3) ``` Across the full grid, tuned lasso has the highest average true-positive rate, but it also carries the highest average false-discovery rate. The current `selectboost_quantile()` release is markedly more conservative: it gives up some recall, but in exchange it sharply lowers the false-discovery rate across the shipped benchmark grid and yields the best average F1 score. ## Correlated but not high-dimensional regimes The `high_dim` scenario is intentionally hard and changes the picture substantially. Excluding that regime gives a cleaner view of the correlated and misspecified-noise settings that the current prototype handles more naturally. ```{r} stable_regimes <- subset(validation_summary, !is_high_dim) stable_overall <- aggregate( cbind(mean_tpr, mean_fdr, mean_f1, failure_rate, mean_runtime_sec) ~ method, data = stable_regimes, FUN = mean ) knitr::kable(stable_overall, digits = 3) ``` On these non-high-dimensional settings, the shipped study shows a consistent pattern: - `lasso_tuned` has the highest mean recall - `selectboost_quantile()` has the lowest mean false-discovery rate by a large margin - `selectboost_quantile()` also has the highest mean F1 score on the shipped grid - `selectboost_quantile()` remains slower than either lasso baseline, which is expected because it perturbs, subsamples, and refits repeatedly The family-level breakdown is below. ```{r} family_summary <- aggregate( cbind(mean_tpr, mean_fdr, mean_f1) ~ family + method, data = stable_regimes, FUN = mean ) knitr::kable(family_summary, digits = 3) ``` ```{r} plot_df <- stable_regimes method_levels <- c("lasso", "lasso_tuned", "selectboost") cols <- c("lasso" = "#4C78A8", "lasso_tuned" = "#F58518", "selectboost" = "#54A24B") plot( plot_df$mean_fdr, plot_df$mean_f1, col = cols[plot_df$method], pch = 19, xlab = "Mean FDR", ylab = "Mean F1", main = "Validation Summary by Scenario" ) legend( "bottomleft", legend = method_levels, col = cols[method_levels], pch = 19, bty = "n" ) ``` ## High-dimensional stress regime The `high_dim` family remains difficult, but it is no longer a failure mode in the earlier sense of selecting almost everything. The improved SelectBoost workflow now returns much sparser and more stable supports than either lasso baseline. ```{r} high_dim <- subset(validation_summary, is_high_dim) high_dim_overall <- aggregate( cbind(mean_tpr, mean_fdr, mean_f1, failure_rate, mean_support_size) ~ method, data = high_dim, FUN = mean ) knitr::kable(high_dim_overall, digits = 3) ``` The main remaining tradeoff is recall: `selectboost_quantile()` is much cleaner than the lasso baselines in `high_dim`, but it is still more conservative and can miss weaker signals. Even so, on the shipped study it achieves the best mean F1 score in that regime because it avoids the large false-positive burden of the lasso baselines. This is the main reason the package is best described as a polished `v2` prototype rather than a finished methodological endpoint. ```{r} failure_rows <- subset(validation_summary, failure_rate > 0) if (nrow(failure_rows)) { knitr::kable(failure_rows[, c( "scenario", "method", "failure_rate", "mean_tpr", "mean_fdr", "mean_support_size" )], digits = 3) } else { cat("No method failures were recorded in the shipped study.\n") } ``` ## Reproducing the study From a source checkout, regenerate benchmark artifacts into a temporary directory with: ```{r, eval = FALSE} out_dir <- file.path(tempdir(), "SelectBoost.quantile-validation") system2( "Rscript", c("inst/scripts/run_quantile_benchmark.R", out_dir, "4", "0.55") ) ``` The script loads the local package automatically when run from a source tree. It writes raw results, aggregated summaries, and a `sessionInfo` record to the chosen output directory. If no output directory is supplied, it defaults to a subdirectory of `tempdir()`. In the current source tree, that rerun uses the screening, stronger lambda, complementary-pairs stability, neighborhood-cap, and hybrid-support defaults defined in the package benchmark helper.