--- title: "validation_tests" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{validation-tests} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 8, message = FALSE, warning = FALSE ) library(ggplot2); library(dplyr); library(tidyr) data.table::setDTthreads(1L) options(dplyr.summarise.inform = FALSE, scipen = 999, digits = 5) theme_set(theme_bw(base_size = 12)) ``` > **Notation.** For symbol definitions, see the [notation vignette](notation.html). ## Overview The aim of this vignette is show how to perform correct validation tests using closest-not-yet-treated control groups using `childpen`. ## Simulate data The package has a built in simulation function to draw data resembling child penalty studies. ```{r draw_data} library(childpen) data <- simulate_data(n_individuals = 5000, treatment_groups = 24:28) data |> tibble() ``` ## The correct validation tests See the [estimation vignette](estimation.html) for an explainer on the $2\times2$ comparisons in `childpen`. Recall that $d$ is the treatment group, $a$ is the target age, and $d^\prime=a+1$ is the closest not-yet-treated control group. Recall that the control group is $d' = a + 1$, the cohort whose first birth is one year after the target age. Assume that when presenting results, post-treatment, you report estimates for event times $e=0,...,2$. Then, for each treatment group $d$ you use 3 different control groups in post-treatment estimation. As the identification assumptions (e.g., parallel trends for DID) must hold for each point-estimate separately, this implies that it must hold within each treatment-control pair. The above argument means that the validation tests should be done separately by treatment-control combinations. Returning to the above example, if you want to show results for $e=0,...,2$ then you need to conduct pre-trend analysis for 3 different control groups. This is done automatically in the `childpen` package, as we show below. For completeness, the validation tests are: 1. Difference-in-differences (DID) estimates the average treatment effect (ATE) in pre-periods 2. Triple differences (TD) estimates the gender gap in the ATE in pre-periods 3. Normalized triple differences (NTD) estimates the gender gap in normalized effects in pre-periods ## Multiple treatment group analysis We will now do the main heavy lifting. We run the main estimation function, `multiple_treatment_group_analysis()`. Set `periods_pre` to the number of pre-treatment periods for which you want to conduct validation tests. As an example, we will examine two periods pre-treatment. Since we set the number of periods in the post-treatment to 2 using `periods_post`, this will report validation tests separately for 3 control groups, as discussed above. ```{r estimation} res = multiple_treatment_group_analysis(data = data, treatment_groups = 24:25, # which treatment groups to run in the analysis periods_post = 2, # estimate results for post periods 0:2 periods_pre = 2, # estimate pre-trend diagnostics, set to NULL to omit from estimation max_age = 40, # dont estimate results if age is above 40 min_age = 20, # dont estimate results if age is below 20 pre = 1, # use 1 period before treatment, can make further away if anticipation is concern verbose = FALSE # set to TRUE to output progress (i like to time loops) set to FALSE to omit messages ) ``` ## Examining results of validation tests As a first pass, lets see the results. ```{r results_first} res |> tibble() ``` Focusing on $d=24$, lets examine pre-trends. We will start with DID of females. Generally, valid pre-trend validation tests would behave such that the confidence intervals include 0, and there is no obvious trend in the pre-period, and there is no systematic difference between control groups. A valid pre-trend test shows point estimates near zero with confidence intervals covering zero and no systematic trend across pre-treatment event times. Note that in the plot below I define `control_offset` as the difference between the control group $d^\prime$ and the treatment group $d$. E.g., for $d=24$ and $d^\prime=25$, i.e., the closest not-yet-treated control group at event time $e=0$, I set control offset to 1. Ribbons present 95\% CI based on standard errors clustered at the individual level. ```{r} res |> filter(d == 24, a < d, estimand == "ATE", method == "DID_Female") |> mutate(control_offset = dp - d, control_offset = factor(control_offset)) |> ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) + geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() + scale_x_continuous(breaks = -3:-2) + facet_grid(cols = vars(control_offset)) ``` Although this would be hard to look at, we can put all control offsets on same plot. ```{r} res |> filter(d == 24, a < d, estimand == "ATE", method == "DID_Female") |> mutate(control_offset = dp - d, control_offset = factor(control_offset)) |> ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) + geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() + scale_x_continuous(breaks = -3:-2) ``` Can do this for multiple treatment groups at same time ```{r} res |> filter(a < d, estimand == "ATE", method == "DID_Female") |> mutate(control_offset = dp - d, control_offset = factor(control_offset)) |> ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) + geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() + scale_x_continuous(breaks = -3:-2) + facet_grid(cols = vars(d)) ``` Finally, can do this for all methods. ```{r} res |> filter(a < d, estimand == "ATE" & (method == "DID_Female" | method == "DID_Male" | method == "TD") | estimand == "theta" & method == "NTD_Conv") |> mutate(control_offset = dp - d, control_offset = factor(control_offset)) |> ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) + geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() + scale_x_continuous(breaks = -3:-2) + facet_grid(cols = vars(d), rows = vars(method), scales = "free") ```