| Type: | Package |
| Title: | Leave-Out Variance Component Estimation for Two-Way Fixed Effects Models |
| Version: | 0.1.0 |
| Author: | Vahid Moghani [aut, cre] |
| Maintainer: | Vahid Moghani <contact@vahid-moghani.com> |
| Description: | Implements leave-out estimation of variance components in two-way fixed effects models as an 'R' translation of the original 'MATLAB' package of Kline, Saggio, and Solvsten (2020) <doi:10.3982/ECTA16410>. The package includes graph-based connected-set pruning, leave-out bias correction, leverage computation by exact and randomized algorithms, fixed effect estimation helpers, and companion model-fit summaries for matched worker-firm panels in the spirit of Abowd, Kramarz, and Margolis (1999) <doi:10.1111/1468-0262.00020>. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | data.table, Matrix, igraph, sanic, parallel, utils, doParallel, foreach |
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-04-16 19:08:57 UTC; cryst |
| Repository: | CRAN |
| Date/Publication: | 2026-04-21 19:02:34 UTC |
Leave-Out Variance Component Estimation for Two-Way Fixed Effects Models
Description
LeaveOutKSS packages an 'R' translation of the original 'MATLAB' package
of Kline, Saggio, and Solvsten (2020) for leave-out bias correction and
variance decomposition in two-way fixed effects models of the Abowd,
Kramarz, and Margolis (1999; AKM) type.
Details
The package mirrors the logic of the original script-based implementation in this repository while exposing object-returning, side-effect-free-by-default workflows. Core estimation functions return structured results and write files only when users explicitly provide output paths. The main user-facing workflows are:
-
leave_out_KSS()for leave-out bias-corrected decomposition in two-way fixed effects models. -
leave_out_KSS_fe()for the same decomposition when some controls are supplied as categorical variables to be expanded internally. -
rsquared_comp()for comparing the fit of a two-way fixed effects model with a saturated worker-firm specification. -
fast_fe_est()for recovering fitted values and adjusted outcomes from one-way or two-way fixed effects models.
The implementation follows the structure of the original 'MATLAB' package and the accompanying vignette material:
Construct the largest connected set of firms.
Prune the sample to a leave-one-worker-out connected set.
Partial out controls when requested.
Compute statistical leverages exactly or by Johnson-Lindenstrauss approximation (JLA).
Form plug-in and leave-out bias-corrected variance component estimates.
A small matched worker-firm panel used by the examples is bundled at
system.file("extdata", "test.csv", package = "LeaveOutKSS").
Author(s)
Maintainer: Vahid Moghani contact@vahid-moghani.com
References
Abowd, J. M., Kramarz, F., and Margolis, D. N. (1999). High wage workers and high wage firms. Econometrica, 67(2), 251-333.
Kline, P., Saggio, R., and Solvsten, M. (2020). Leave-out estimation of variance components. Econometrica, 88(5), 1859-1898.
Johnson, W. B., and Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. In Conference in Modern Analysis and Probability, 189-206.
Build a Firm-to-Firm Mobility Adjacency Matrix
Description
Constructs a symmetric sparse adjacency matrix of firm mobility links using worker transitions. Only movers contribute edges.
Usage
build_adj(id, firmid)
Arguments
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
Value
A sparse square adjacency matrix whose nonzero entries count observed worker moves between firms.
See Also
connected_set(), pruning_unbal_v3()
Examples
build_adj(
id = c(1, 1, 2, 2, 3, 3),
firmid = c(1, 2, 2, 3, 3, 3)
)
Restrict a Panel to Its Largest Connected Set of Firms
Description
Builds a mobility graph from worker moves across firms and keeps only the largest connected component of firms. This is the first graph-based trimming step used by the leave-out routines before leave-one-worker-out pruning.
Usage
connected_set(
y,
id,
firmid,
lagfirmid,
controls,
prov_indicator = rep(1, length(y)),
progress = FALSE
)
Arguments
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
lagfirmid |
Lagged firm identifier vector, typically constructed within worker. |
controls |
Matrix of controls aligned with the observations. |
prov_indicator |
Optional provider indicator carried along for interface compatibility. |
progress |
Logical scalar indicating whether stage messages should be emitted. |
Details
The graph is built from observed worker transitions between lagged and current firms. Firms not connected to the largest component are removed. The function relabels worker and firm identifiers internally but preserves the originals in the returned table.
Value
A list with two elements: DT, a data.table containing the
restricted sample and original identifiers, and DT_controls, the
correspondingly restricted controls.
See Also
pruning_unbal_v3(), strongc_set(), build_adj()
Fit a One-Way or Two-Way Fixed Effects Model
Description
Solves a fixed effects model using conjugate gradients and returns fitted
values and adjusted outcomes as an object. When firmid is omitted, the
routine estimates a one-way worker fixed effects model. When firmid is
supplied, it estimates a two-way worker-firm fixed effects model.
Usage
fast_fe_est(
y,
id,
firmid = NULL,
controls = NULL,
csv_file = NULL,
progress = FALSE
)
Arguments
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Optional firm identifier vector. If |
controls |
Optional matrix or vector of controls. |
csv_file |
Optional path for exporting the fitted values table as a
|
progress |
Logical scalar indicating whether stage progress messages should be emitted. |
Details
This helper is useful when the goal is to recover fitted values and
residualized outcomes rather than the leave-out variance decomposition. The
returned fitted-values table includes y_hat, y_adj, and the original
identifiers. When csv_file is supplied, that table is also written to disk.
Value
An object of class "fast_fe_est_result" containing the fitted
values table, model metadata, and elapsed time.
See Also
leave_out_KSS(), rsquared_comp()
Examples
path <- system.file("extdata", "test.csv", package = "LeaveOutKSS")
dt <- data.table::fread(path, header = FALSE)
res <- fast_fe_est(
y = dt[[4]],
id = dt[[1]],
firmid = dt[[2]],
controls = cbind(year = dt[[3]])
)
print(res)
Internal Helpers for Progress, Formatting, and Optional Export
Description
These helpers keep the computational routines side-effect free by default while providing opt-in progress reporting and file export.
Evaluate Plug-In and Kline, Saggio, and Solvsten (KSS)-Corrected Quadratic Forms
Description
Computes a covariance-like quadratic form from transformed coefficient
vectors and subtracts the Kline, Saggio, and Solvsten (KSS) bias adjustment
based on observation-specific variances and Bii weights.
Usage
kss_quadratic_form(sigma_i, A_1, A_2, beta, Bii)
Arguments
sigma_i |
Vector of leave-out variance estimates. |
A_1 |
Matrix used to transform the coefficient vector on the left side of the quadratic form. |
A_2 |
Matrix used to transform the coefficient vector on the right side of the quadratic form. |
beta |
Estimated coefficient vector. |
Bii |
Vector of observation-specific bias terms for the target variance component. |
Value
A named list with theta, the plug-in estimate, and theta_KSS, the
bias-corrected estimate.
See Also
Examples
A <- diag(2)
kss_quadratic_form(
sigma_i = c(1, 2),
A_1 = A,
A_2 = A,
beta = c(0.5, 1),
Bii = c(0.1, 0.2)
)
Internal Helpers for Building Undirected Graphs from Sparse Adjacency Matrices
Description
These helpers avoid igraph::graph_from_adjacency_matrix() for very large
sparse mobility graphs by extracting nonzero edges directly from a sparse
adjacency matrix and then constructing an undirected igraph object from the
resulting edge list.
Usage
kss_sparse_undirected_edges(A, diag = FALSE)
Details
They are used internally by the connected-set and pruning routines.
Leave-Out Bias-Corrected Variance Decomposition in a Two-Way Fixed Effects Model
Description
Estimates plug-in and leave-out bias-corrected variance components for a two-way fixed effects model as part of the R translation of the original 'MATLAB' package of Kline, Saggio, and Solvsten (2020). The function starts from worker identifiers, firm identifiers, and an outcome, constructs the leave-one-worker-out connected set, optionally partials out controls, computes statistical leverages either exactly or via the Johnson-Lindenstrauss approximation (JLA), and returns decomposition summaries together with estimated worker and firm effects.
Usage
leave_out_KSS(
y,
id,
firmid,
controls = NULL,
leave_out_level = "matches",
type_algorithm = "JLA",
simulations_JLA = 200,
lincom_do = 0,
Z_lincom = NULL,
labels_lincom = NULL,
csv_file = NULL,
txt_file = NULL,
paral = TRUE,
Cd = 12345,
progress = FALSE
)
Arguments
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
controls |
Optional matrix or vector of controls. When supplied, the function prepends an intercept internally and residualizes the outcome with respect to worker, firm, and control regressors before computing variance components. |
leave_out_level |
Character scalar. Use |
type_algorithm |
Character scalar. Use the randomized
Johnson-Lindenstrauss approximation ( |
simulations_JLA |
Integer number of random projections when
|
lincom_do |
Integer flag equal to |
Z_lincom |
Optional matrix of observables used by |
labels_lincom |
Optional labels for the columns of |
csv_file |
Optional path for exporting the estimated effects table as a
|
txt_file |
Optional path for exporting a text summary of the decomposition. |
paral |
Logical scalar indicating whether leverage computation should use
the parallel routine |
Cd |
Integer random seed passed to |
progress |
Logical scalar indicating whether stage progress messages should be emitted. |
Details
Relative to the original 'MATLAB' package, this implementation follows the same broad sequence: connected-set construction, leave-out pruning, optional residualization of controls, leverage computation, and bias correction of the variance of firm effects, the covariance of worker and firm effects, and the variance of worker effects.
The decomposition is based on an Abowd, Kramarz, and Margolis (1999;
AKM)-style model with worker effects, firm effects, and optional controls.
By default, the function leaves out matches, which corresponds to allowing
unrestricted heteroskedasticity and arbitrary serial correlation within
worker-firm matches, in line with the discussion in the original vignette.
When leave_out_level = "obs", the correction is based on leaving out one
person-year observation at a time.
When controls are supplied, the function first estimates their coefficients
in the leave-out connected set and then works with the residualized outcome.
When lincom_do = 1, the function additionally reports linear projections of
firm effects on observables using lincom_KSS().
The input vectors must be sorted by worker identifier and, within worker,
from earlier to later time periods before calling the function. When
controls or Z_lincom are supplied, they must follow that same sorted row
order.
The returned object is the primary estimation record. It stores the
decomposition summaries, estimated worker and firm effects, and optional
lincom output. When csv_file or txt_file are supplied, those summaries
are also written to disk.
Value
An object of class "leave_out_kss_result" containing biased and
bias-corrected estimates, estimated worker and firm effects, optional
lincom results, sample summaries, and elapsed time.
References
Kline, P., Saggio, R., and Solvsten, M. (2020). Leave-out estimation of variance components. Econometrica, 88(5), 1859-1898.
Abowd, J. M., Kramarz, F., and Margolis, D. N. (1999). High wage workers and high wage firms. Econometrica, 67(2), 251-333.
See Also
leave_out_KSS_fe(), rsquared_comp(), lincom_KSS(),
leverages(), leverages_parallel()
Examples
path <- system.file("extdata", "test.csv", package = "LeaveOutKSS")
dt <- data.table::fread(path, header = FALSE)
data.table::setorder(dt, V1, V3)
res <- leave_out_KSS(
y = dt[[4]],
id = dt[[1]],
firmid = dt[[2]],
simulations_JLA = 5,
paral = FALSE,
progress = FALSE
)
print(res)
Leave-Out Bias-Corrected Decomposition with Internally Expanded Fixed-Effect Controls
Description
Variant of leave_out_KSS() that allows selected control columns to be
treated as categorical regressors and expanded into dummy variables inside the
routine. This mirrors the use case discussed in the original 'MATLAB'
vignette where time effects or other discrete controls are partialled out
before the leave-out variance decomposition is computed.
Usage
leave_out_KSS_fe(
y,
id,
firmid,
controls = NULL,
absorb_col = NULL,
leave_out_level = "matches",
type_algorithm = "JLA",
simulations_JLA = 200,
lincom_do = 0,
Z_lincom = NULL,
labels_lincom = NULL,
csv_file = NULL,
txt_file = NULL,
paral = TRUE,
Cd = 12345,
progress = FALSE
)
Arguments
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
controls |
Optional matrix or vector of controls. When supplied, the function prepends an intercept internally and residualizes the outcome with respect to worker, firm, and control regressors before computing variance components. |
absorb_col |
Optional integer vector identifying columns of |
leave_out_level |
Character scalar. Use |
type_algorithm |
Character scalar. Use the randomized
Johnson-Lindenstrauss approximation ( |
simulations_JLA |
Integer number of random projections when
|
lincom_do |
Integer flag equal to |
Z_lincom |
Optional matrix of observables used by |
labels_lincom |
Optional labels for the columns of |
csv_file |
Optional path for exporting the estimated effects table as a
|
txt_file |
Optional path for exporting a text summary of the decomposition. |
paral |
Logical scalar indicating whether leverage computation should use
the parallel routine |
Cd |
Integer random seed passed to |
progress |
Logical scalar indicating whether stage progress messages should be emitted. |
Details
The function follows the same workflow as leave_out_KSS() but modifies the
control-adjustment step. When absorb_col is supplied, the corresponding
columns are treated as categorical effects and expanded into dummy variables
inside the leave-out connected set before residualization. This is convenient
for year effects or other high-level discrete controls that are easier to
supply in coded form than as a pre-built model matrix.
As with leave_out_KSS(), the input vectors must be sorted by worker
identifier and, within worker, from earlier to later time periods before
calling the function. Any supplied control columns must follow that same row
order.
The rest of the decomposition logic is unchanged: the function constructs a
leave-one-worker-out connected set, computes leverages, and returns plug-in
and bias-corrected variance components together with estimated worker and
firm effects. When csv_file or txt_file are supplied, those summaries are
also written to disk.
Value
An object of class "leave_out_kss_result" containing biased and
bias-corrected estimates, estimated worker and firm effects, optional
lincom results, sample summaries, and elapsed time.
References
Kline, P., Saggio, R., and Solvsten, M. (2020). Leave-out estimation of variance components. Econometrica, 88(5), 1859-1898.
See Also
leave_out_KSS(), rsquared_comp()
Examples
path <- system.file("extdata", "test.csv", package = "LeaveOutKSS")
dt <- data.table::fread(path, header = FALSE)
data.table::setorder(dt, V1, V3)
res <- leave_out_KSS_fe(
y = dt[[4]],
id = dt[[1]],
firmid = dt[[2]],
controls = cbind(year = dt[[3]]),
absorb_col = 1,
simulations_JLA = 5,
paral = FALSE,
progress = FALSE
)
print(res)
Compute Statistical Leverages and Bias Terms
Description
Computes the observation-level leverage quantities used in the Kline, Saggio, and Solvsten (KSS) bias correction, either exactly or with a Johnson-Lindenstrauss approximation (JLA).
Usage
leverages(X_fe, X_pe, X, xx, type_algorithm, scale, progress = FALSE)
Arguments
X_fe |
Matrix used for the firm-effect variance component. |
X_pe |
Matrix used for the person-effect variance component. |
X |
Main design matrix. |
xx |
Crossproduct matrix |
type_algorithm |
Character scalar, either |
scale |
Number of random projections when |
progress |
Logical scalar indicating whether leverage progress should be displayed. |
Details
The exact branch solves one linear system per observation. The Johnson-Lindenstrauss approximation (JLA) branch follows the randomized projection logic described in the original vignette to approximate the same quantities at lower computational cost on large panels.
Value
A list with elements Pii, Mii, correction_JLA, Bii_fe,
Bii_cov, and Bii_pe.
See Also
leverages_parallel(), leave_out_KSS()
Parallel Computation of Statistical Leverages and Bias Terms
Description
Parallel version of leverages() using foreach and doParallel.
Usage
leverages_parallel(X_fe, X_pe, X, xx, type_algorithm, scale, progress = FALSE)
Arguments
X_fe |
Matrix used for the firm-effect variance component. |
X_pe |
Matrix used for the person-effect variance component. |
X |
Main design matrix. |
xx |
Crossproduct matrix |
type_algorithm |
Character scalar, either |
scale |
Number of random projections when |
progress |
Logical scalar indicating whether leverage progress should be displayed. |
Details
The exact and Johnson-Lindenstrauss approximation (JLA) branches mirror
leverages(), but the repeated linear solves are distributed across worker
processes. This routine is intended for larger problems where the leverage
stage dominates runtime.
Value
A list with the same elements returned by leverages().
See Also
Linear Projections of Estimated Firm Effects with Kline, Saggio, and Solvsten (KSS) Standard Errors
Description
Regresses transformed fixed effects on observables and reports both naive and Kline, Saggio, and Solvsten (KSS)-corrected standard errors. This corresponds to the "lincom" discussion in the original vignette on regressing firm effects on observables.
Usage
lincom_KSS(y, X, Z, Transform, sigma_i, labels = NULL)
Arguments
y |
Outcome vector used to estimate the original model. |
X |
Design matrix used to estimate the fixed effects model. |
Z |
Matrix of observables used in the linear projection. |
Transform |
Matrix that maps model coefficients into the fixed effect of interest, typically firm effects. |
sigma_i |
Observation-specific leave-out variance estimates. |
labels |
Optional labels for the columns of |
Value
An object of class "lincom_kss_result" containing a results table
with coefficient estimates, naive standard errors, KSS-corrected standard
errors, and t statistics.
See Also
leave_out_KSS(), kss_quadratic_form()
Print a Fixed Effects Fit Result
Description
Print a Fixed Effects Fit Result
Usage
## S3 method for class 'fast_fe_est_result'
print(x, ...)
Arguments
x |
A result returned by |
... |
Unused. |
Value
x, invisibly.
Print a LeaveOutKSS Decomposition Result
Description
Print a LeaveOutKSS Decomposition Result
Usage
## S3 method for class 'leave_out_kss_result'
print(x, ...)
Arguments
x |
A result returned by |
... |
Unused. |
Value
x, invisibly.
Print a Lincom Result
Description
Print a Lincom Result
Usage
## S3 method for class 'lincom_kss_result'
print(x, ...)
Arguments
x |
A result returned by |
... |
Unused. |
Value
x, invisibly.
Print an R-Squared Comparison Result
Description
Print an R-Squared Comparison Result
Usage
## S3 method for class 'rsquared_comp_result'
print(x, ...)
Arguments
x |
A result returned by |
... |
Unused. |
Value
x, invisibly.
Prune to a Leave-One-Worker-Out Connected Set
Description
Iteratively removes articulation workers from the worker-firm mobility graph until the remaining sample stays connected after dropping any single worker. This implements the leave-one-worker-out connectivity requirement used by the main Kline, Saggio, and Solvsten (KSS) routines.
Usage
pruning_unbal_v3(
y,
firmid,
id,
id_old,
firmid_old,
controls,
prov_indicator = rep(1, length(y)),
progress = FALSE
)
Arguments
y |
Numeric outcome vector. |
firmid |
Firm identifier vector. |
id |
Worker identifier vector. |
id_old |
Original worker identifiers. |
firmid_old |
Original firm identifiers. |
controls |
Matrix of controls aligned with the observations. |
prov_indicator |
Optional provider indicator carried along with the sample. |
progress |
Logical scalar indicating whether iterative pruning progress should be emitted. |
Details
The routine constructs a bipartite worker-firm graph for movers, identifies articulation workers, removes them, and recomputes the largest connected component until no articulation worker remains.
Value
A list containing the pruned outcome, identifiers, controls, and provider indicator.
See Also
connected_set(), build_adj(), leave_out_KSS()
Compare Two-Way Fixed Effects and Saturated-Model R-Squared Values
Description
Computes goodness-of-fit summaries for a two-way fixed effects model and for a saturated worker-firm interaction model on the same sample. The function is intended as a diagnostic companion to the leave-out decomposition routines and follows the same basic data-preparation conventions.
Usage
rsquared_comp(
y,
id,
firmid,
controls = NULL,
txt_file = NULL,
progress = FALSE
)
Arguments
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
controls |
Optional matrix or vector of additional controls. |
txt_file |
Optional path for exporting a text summary of the comparison. |
progress |
Logical scalar indicating whether stage progress messages should be emitted. |
Details
The two-way fixed effects model includes worker effects, firm effects, and optional controls. The saturated model replaces separate worker and firm effects with worker-firm interaction indicators. Comparing the two summaries can be useful when evaluating how much additional fit is obtained by moving from the standard Abowd, Kramarz, and Margolis (1999; AKM) specification to a fully saturated match design.
Value
An object of class "rsquared_comp_result" containing a summary
table for the two fitted models and the elapsed time.
See Also
leave_out_KSS(), leave_out_KSS_fe(), fast_fe_est()
Examples
path <- system.file("extdata", "test.csv", package = "LeaveOutKSS")
dt <- data.table::fread(path, header = FALSE)
res <- rsquared_comp(
y = dt[[4]],
id = dt[[1]],
firmid = dt[[2]],
progress = FALSE
)
print(res)
Approximate Leave-Out Variance Terms for Stayers
Description
Computes the stayer-specific adjustment used when the main decomposition is performed at the match level. In that case, the current implementation uses a leave-one-observation-out style adjustment for stayers, following the approximation discussed in the original vignette.
Usage
sigma_for_stayers(y, id, firmid, peso, b)
Arguments
y |
Outcome vector in person-year space. |
id |
Worker identifier vector in collapsed match space. |
firmid |
Firm identifier vector in collapsed match space. |
peso |
Match weights used to expand back to person-year space. |
b |
Estimated coefficient vector from the worker-firm fixed effects regression. |
Value
A vector of averaged stayer variance adjustments at the match level.
See Also
leave_out_KSS(), leave_out_KSS_fe()
Restrict a Panel to Firms Above a Minimum Graph Degree Threshold
Description
Graph-based trimming helper that keeps firms whose degree in the mobility
graph is at least min_degree. This is a stronger restriction than the basic
connected-set filter and can be useful when the analyst wants a denser firm
network.
Usage
strongc_set(y, id, firmid, controls, min_degree = 1, progress = FALSE)
Arguments
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
controls |
Matrix of controls aligned with the observations. |
min_degree |
Minimum graph degree required for a firm to remain in the sample. |
progress |
Logical scalar indicating whether graph summary messages should be emitted. |
Value
A list with DT and DT_controls, analogous to connected_set().
See Also
Summarize a LeaveOutKSS Decomposition Result
Description
Summarize a LeaveOutKSS Decomposition Result
Usage
## S3 method for class 'leave_out_kss_result'
summary(object, ...)
Arguments
object |
A result returned by |
... |
Unused. |
Value
object, invisibly.