Macroeconomic data are often subject to revisions, reflecting the integration of new information, methodological improvements, and statistical adjustments. Understanding the optimal properties of a revision process is essential to ensure that early data releases provide a reliable foundation for policy decisions and economic forecasting. This vignette discusses the characteristics of optimal revisions, formalizes initial estimates as truth measured with noise, and presents an iterative approach to identify the point at which revisions become unpredictable also known as the efficient release. Further, examples are provided to illustrate the application of these concepts to GDP data.
Data revisions can be classified into two main types: ongoing revisions, which incorporate new information as it becomes available, and benchmark revisions, which reflect changes in definitions, classifications, or methodologies. Optimally, revisions should satisfy the following properties (Aruoba 2008):
Unbiasedness: Revisions should not systematically push estimates in one direction. Mathematically, for a revision series \(r_t^f = y_t^f - y_t^h\) where \(h\) denotes the release number \(y_t^f\) represents the final release and \(y_t^h\) represents the series released at time \(h\) (\(h=0\) is the initial release), unbiasedness requires: \[ E[r_t^f] = 0 \]
Efficiency: An efficient release incorporates all available information optimally, ensuring that subsequent revisions are unpredictable. This can be tested using the Mincer-Zarnowitz regression: \[ y_t^f = \alpha + \beta y_t^h + \varepsilon_t, \] where an efficient release satisfies \(\alpha = 0\) and \(\beta = 1\).
Minimal Variance: Revisions should be as small as possible while still improving accuracy. The variance of revisions, defined as \(\text{Var}(r_t^f)\), should decrease over successive releases.
A fundamental assumption in many revision models is that the first release of a data point is an imperfect measure of the true value due to measurement errors. This can be expressed as: \[ y_t^h = y_t^* + \varepsilon_t^h, \] where \(y_t^*\) is the true value and \(\varepsilon_t^h\) is an error term that diminishes as \(h\) increases. The properties of \(r_t^h\) determine whether the preliminary release \(h\) is an efficient estimator of \(y_t^*\). If \(r_t^h\) is predictable, the revision process is inefficient, indicating room for improvement in the initial estimates. Vice-versa if \(r_t^h\) is unpredictable, the preliminary release is efficient (i.e \(y_t^h = y_t^e\)).
One major challenge in identifying an efficient release using Mincer-Zarnowitz regressions is that it is often unclear which data release should be considered final. While some revisions occur due to the incorporation of new information, others result from methodological changes that redefine past values. Consequently, statistical agencies may continue revising data for years, making it difficult to pinpoint a definitive final release. For instance in Germany the final release of national accounts data is typically published four years after the initial release. In Switzerland, GDP figures are never finalized. So, defining a final release is a non-trivial task that requires knowledge of the revision process.
An iterative approach is proposed to determine the optimal number of revisions, \(e\), beyond which further revisions are negligible. This approach is based on running Mincer-Zarnowitz-style regressions to assess which release provides an optimal estimate of the final value. The procedure follows these steps (Kishor and Koenig 2012; Strohsal and Wolf 2020):
Regression Analysis: Regress the final release \(y_t^f\) on the initial releases \(y_t^h\) for \(h = 1,2,\dots, H\): \[ y_t^f = \alpha + \beta y_t^h + \varepsilon_t \] where the null hypothesis is that \(\alpha = 0\) and \(\beta = 1\), indicating that \(y_t^h\) is an efficient estimate of \(y_t^f\).
Determine the Optimal \(e\): Increase \(h\) iteratively and test whether the efficiency conditions hold. The smallest \(e\) for which the hypothesis is not rejected is considered the final efficient release.
An efficient release is essential for various economic applications:
An optimal revision process is one that leads to unbiased, efficient, and minimally variant data revisions. Initial estimates represent the truth measured with noise, and identifying the efficient release allows analysts to determine the earliest point at which data can be used reliably without concern for systematic revisions. The iterative approach based on Mincer-Zarnowitz regressions provides a robust framework for achieving this goal, improving the reliability of macroeconomic data for forecasting and policy analysis.
reviserIn the following example, we use the reviser package to
identify the efficient release in quarterly Euro Area GDP data. We test
the first 15 data releases and use the 16th release as the benchmark
final release. The sample is restricted to a single series and a shorter
time span to keep the vignette lightweight while preserving a
non-trivial efficient-release result.
library(reviser)
library(dplyr)
gdp <- reviser::gdp |>
tsbox::ts_pc() |>
dplyr::filter(
id == "EA",
time >= min(pub_date),
time <= as.Date("2020-01-01")
) |>
tidyr::drop_na()
df <- get_nth_release(gdp, n = 0:14)
final_release <- get_nth_release(gdp, n = 15)
efficient <- get_first_efficient_release(
df,
final_release
)
res <- summary(efficient)
#> Efficient release: 2
#>
#> Model summary:
#>
#> Call:
#> stats::lm(formula = formula, data = df_wide)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.34873 -0.08185 -0.00706 0.10475 0.31533
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.03276 0.01775 1.846 0.0692 .
#> release_2 1.01446 0.02440 41.577 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.1428 on 68 degrees of freedom
#> Multiple R-squared: 0.9622, Adjusted R-squared: 0.9616
#> F-statistic: 1729 on 1 and 68 DF, p-value: < 2.2e-16
#>
#>
#> Test summary:
#>
#> Linear hypothesis test:
#> (Intercept) = 0
#> release_2 = 1
#>
#> Model 1: restricted model
#> Model 2: final ~ release_2
#>
#> Note: Coefficient covariance matrix supplied.
#>
#> Res.Df Df F Pr(>F)
#> 1 70
#> 2 68 2 2.743 0.07151 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
head(res)
#> # A tibble: 1 × 5
#> e alpha beta p_value n_tested
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 2 0.0328 1.01 0.0715 3Note: The identification of the first efficient release is related to
the news hypothesis testing. Hence, the same conclusion could be reached
by using the function get_revision_analysis() (setting
degree=3, providing results for news and noise tests) as
shown below. An advantage of using the
get_first_efficient_release() function is that it organizes
the data to be subsequently used in kk_nowcast() and
jvn_nowcast() to improve nowcasts of preliminary releases
(See vignettes Nowcasting
revisions using the generalized Kishor-Koenig family and Nowcasting revisions using the
Jacobs-Van Norden model for more details).
analysis <- get_revision_analysis(
df,
final_release,
degree = 3
)
head(analysis)
#>
#> === Revision Analysis Summary ===
#>
#> # A tibble: 6 × 17
#> id release N `News test Intercept` `News test Intercept (std.err)`
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 EA release_0 70 0.043 0.03
#> 2 EA release_1 70 0.041 0.024
#> 3 EA release_10 70 0 0.008
#> 4 EA release_11 70 -0.003 0.006
#> 5 EA release_12 70 -0.004 0.007
#> 6 EA release_13 70 0.002 0.005
#> # ℹ 12 more variables: `News test Intercept (p-value)` <dbl>,
#> # `News test Coefficient` <dbl>, `News test Coefficient (std.err)` <dbl>,
#> # `News test Coefficient (p-value)` <dbl>, `News joint test (p-value)` <dbl>,
#> # `Noise test Intercept` <dbl>, `Noise test Intercept (std.err)` <dbl>,
#> # `Noise test Intercept (p-value)` <dbl>, `Noise test Coefficient` <dbl>,
#> # `Noise test Coefficient (std.err)` <dbl>,
#> # `Noise test Coefficient (p-value)` <dbl>, …
#>
#> === Interpretation ===
#>
#> id=EA, release=release_0:
#> • Revisions contain NEWS (p = 0.044 ): systematic information
#> • Revisions contain NOISE (p = 0.001 ): measurement error
#>
#> id=EA, release=release_1:
#> • Revisions contain NEWS (p = 0.042 ): systematic information
#> • Revisions contain NOISE (p = 0.007 ): measurement error
#>
#> id=EA, release=release_10:
#> • Revisions do NOT contain news (p = 0.124 )
#> • Revisions contain NOISE (p = 0.019 ): measurement error
#>
#> id=EA, release=release_11:
#> • Revisions do NOT contain news (p = 0.524 )
#> • Revisions do NOT contain noise (p = 0.132 )
#>
#> id=EA, release=release_12:
#> • Revisions do NOT contain news (p = 0.628 )
#> • Revisions do NOT contain noise (p = 0.402 )
#>
#> id=EA, release=release_13:
#> • Revisions do NOT contain news (p = 0.553 )
#> • Revisions do NOT contain noise (p = 0.378 )