--- title: "Using Custom Datasets" author: "Avishek Bhandari" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using Custom Datasets} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` # Overview The G20 panel that ships with `contagionchannels` is one possible application; the same machinery applies to any directed network of return series and any set of channel proxies a user can construct. This vignette shows how to plug in custom data, explains the contracts each function expects, and ends with a worked example using a synthetic five-market panel seeded for reproducibility. ```{r libs} library(contagionchannels) library(xts) library(dplyr) ``` # 1. Required data structure Two objects drive the entire pipeline. **Returns.** An `xts` object with one column per market. Rows are trading days (the index must be `Date` or `POSIXct`); columns are demeaned daily log returns. Missing values are allowed but should be sparse; `na.locf` or listwise deletion are both acceptable, but the user is responsible for the choice. **Channel proxies.** A `data.frame` with a `Date` column matching the returns index plus one column per *raw component* of each channel. The helper `build_channel_composites()` does the standardisation and PCA reduction; the user must only ensure that the components are aligned on date and roughly stationary (returns or first-differenced spreads, never raw levels). ```{r contracts, eval = FALSE} str(my_returns_xts) # An 'xts' object on 2010-01-04/2024-12-31 of 18 columns str(my_channels_df) # 'data.frame' with Date + raw component columns ``` # 2. Custom market list and crisis periods The package does not hard-code 18 markets; the WQTE estimator accepts any $N \ge 3$ markets. To pass your own crisis periods, build a named `list` of `Date`-vector pairs: ```{r periods, eval = FALSE} my_periods <- list( Pre_Pandemic = as.Date(c("2018-01-02", "2020-01-31")), Pandemic = as.Date(c("2020-02-01", "2021-12-31")), Recovery = as.Date(c("2022-01-01", "2024-12-31")) ) ``` Periods need not be contiguous and can overlap, though overlapping windows complicate cross-period comparisons. The conventional non-overlap rule is what the paper uses. # 3. Custom channel composites `build_channel_composites()` exposes a `mapping` argument that lets the user override the default component-to-channel assignment. Each channel must have at least two components for the PCA reduction to be defined. ```{r composites, eval = FALSE} my_mapping <- list( Trade = c("BDI_chg", "TradeWeightedFX_chg", "ContainerRate_chg"), Financial = c("FRAOIS", "TEDspread", "CDS_lvl"), Geopolitical = c("GPR_daily", "GPR_actions"), Behavioral = c("VIX_innov", "VVIX_innov", "PutCallRatio"), Monetary_Policy = c("ShadowRate_surp", "FF_futures_surp") ) my_channels <- build_channel_composites( proxy_grid = my_proxies_df, mapping = my_mapping, standardise = "rolling_252" ) ``` The `standardise` argument selects between `"global"` (one z-score over the whole sample), `"rolling_252"` (one-year rolling), or `"period"` (within each crisis period). The paper uses `"rolling_252"` to match the daily update cadence of the underlying components. # 4. Calling the pipeline Once returns and composites are in hand, `run_contagion_pipeline()` consumes both. The full call mirrors the replication vignette but with user-supplied objects: ```{r pipeline, eval = FALSE} results <- run_contagion_pipeline( returns = my_returns_xts, channels = my_channels, periods = my_periods, scale = 5, tau = 0.50, abs_threshold = NULL, # NULL => derive from first period Q75 methods = c("iv2sls", "lp"), bootstrap_B = 499, n_cores = 4 ) ``` Setting `abs_threshold = NULL` instructs the pipeline to derive the absolute threshold from the 75th percentile of the *first* listed period's WQTE distribution. Alternatively, pass a numeric value to fix the threshold across applications, which is useful for cross-paper comparability. # 5. Adapting visualisations for custom data The plotting helpers accept any tidy `data.frame` with a `Period` and `Channel` column; they do not assume the eight G20 sub-periods. To customise: ```{r plotting, eval = FALSE} plot_attribution_stack( shares_long = results$shares_long, period_order = names(my_periods), palette = c(Trade = "#1f77b4", Financial = "#d62728", Geopolitical = "#9467bd", Behavioral = "#2ca02c", Monetary_Policy = "#ff7f0e") ) plot_qte_intensity( F_matrix = results$F_matrices$Pandemic, threshold = results$abs_threshold, market_order = c("US", "EU", "JP", "EM_Asia", "EM_LatAm") ) ``` Both helpers return a `ggplot` object that can be modified with the standard `ggplot2` grammar. # 6. Worked example: synthetic five-market panel A small, fully-reproducible example is the most reliable way to confirm the contracts. Below we simulate five correlated equity-like return series together with five channel components, build the composites, and run a trimmed Stage 1 + Stage 2 pipeline. ```{r synthetic-data} set.seed(20260429) n_obs <- 1500 markets <- c("US", "EU", "JP", "EM_Asia", "EM_LatAm") dates <- seq.Date(from = as.Date("2018-01-02"), by = "day", length.out = n_obs) # Common factor + idiosyncratic shocks Fcom <- rnorm(n_obs, sd = 0.012) ret_mat <- sapply(markets, function(m) { loading <- runif(1, 0.4, 0.9) loading * Fcom + rnorm(n_obs, sd = 0.010) }) my_returns <- xts(ret_mat, order.by = dates) # Channel proxy raw components my_proxies <- data.frame( Date = dates, BDI_chg = rnorm(n_obs, sd = 0.5), TradeFX_chg = rnorm(n_obs, sd = 0.4), FRAOIS = arima.sim(list(ar = 0.95), n_obs) * 0.01, TEDspread = arima.sim(list(ar = 0.93), n_obs) * 0.01, GPR_daily = exp(rnorm(n_obs, sd = 0.2)), GPR_actions = exp(rnorm(n_obs, sd = 0.3)), VIX_innov = rnorm(n_obs, sd = 1.5), VVIX_innov = rnorm(n_obs, sd = 1.0), ShadowRate_surp = rnorm(n_obs, sd = 0.05), FF_futures_surp = rnorm(n_obs, sd = 0.04) ) my_mapping <- list( Trade = c("BDI_chg", "TradeFX_chg"), Financial = c("FRAOIS", "TEDspread"), Geopolitical = c("GPR_daily", "GPR_actions"), Behavioral = c("VIX_innov", "VVIX_innov"), Monetary_Policy = c("ShadowRate_surp", "FF_futures_surp") ) my_periods <- list( Calm = as.Date(c("2018-01-02", "2019-12-31")), Stress = as.Date(c("2020-01-01", "2021-12-31")) ) ``` ```{r synthetic-composites, eval = FALSE} my_channels <- build_channel_composites( proxy_grid = my_proxies, mapping = my_mapping, standardise = "rolling_252" ) head(my_channels, 3) ``` A trimmed Stage 1 estimate on the *Calm* sub-period: ```{r synthetic-stage1, eval = FALSE} calm_dates <- my_periods$Calm returns_calm <- my_returns[paste0(calm_dates[1], "/", calm_dates[2])] F_calm <- compute_wqte_matrix( returns = returns_calm, scale = 5, tau = 0.50, n_cores = 1, ) abs_thr_calm <- quantile( F_calm[upper.tri(F_calm) | lower.tri(F_calm)], probs = 0.75, na.rm = TRUE ) links_calm <- which(F_calm >= abs_thr_calm, arr.ind = TRUE) nrow(links_calm) ``` A Stage 2 IV/2SLS attribution on the same window: ```{r synthetic-stage2, eval = FALSE} channels_calm <- my_channels[ my_channels$Date >= calm_dates[1] & my_channels$Date <= calm_dates[2], ] iv_calm <- iv_2sls_attribute( returns_period = returns_calm, channels_period = channels_calm, links = links_calm, cluster_se = TRUE ) iv_calm$shares ``` End-to-end pipeline call on the synthetic data: ```{r synthetic-pipeline, eval = FALSE} synth_results <- run_contagion_pipeline( returns = my_returns, channels = my_channels, periods = my_periods, scale = 5, tau = 0.50, abs_threshold = abs_thr_calm, methods = c("iv2sls"), bootstrap_B = 199, n_cores = 1 ) synth_results$summary_table ``` The synthetic panel is too small for inference to be meaningful, but running the chain end-to-end is the cleanest way to verify that the data contracts are satisfied before committing compute to a real estimation. # Common pitfalls A few user errors recur in support requests: * **Date misalignment.** `my_proxies$Date` must match `index(my_returns)` exactly; the pipeline performs an `inner_join` and silently drops dates that do not match. Always inspect `nrow(my_channels)` after composites are built. * **Levels rather than changes.** Channel proxies should be stationary. Pass first-differenced spreads, log-changes of indices, or innovations from a fitted model — never raw levels of an integrated series. * **Too few markets.** WQTE is defined for $N \ge 3$, but Stage 2 attribution needs enough directional links per period to estimate five channel coefficients. Plan for at least eight markets and a sub-period spanning 250 trading days. * **Wrong threshold base.** When using `abs_threshold = NULL` the pipeline defaults to the first listed period; if your first period is a stress period the resulting threshold will be conservative and may zero out links in calmer periods. List a calm period first when relying on the default. These pitfalls aside, the package's contracts are deliberately minimal: an `xts` of returns, a `data.frame` of proxies, and a `list` of period endpoints. Anything that satisfies those three constraints can be analysed with the same machinery that produces the headline results in the paper. # Session info ```{r session} sessionInfo() ```