This vignette covers parallel vs sequential execution in the estimation function.
dineR supports parallel execution of the ADMM optimisation
across the regularisation path via the cores argument of
the estimation function. By default, the function runs
sequentially (cores = 1). Setting cores to a
value greater than 1 distributes each lambda value across the specified
number of worker processes, which can yield substantial reductions in
wall-clock time for medium-to-large problems.
The cores argument is the only change required to move
between modes. The value must be a positive integer no greater than
parallel::detectCores() - 1.
# Generate data for the examples below
data <- data_generator(n_X = 150, p = 100, seed = 42)
X <- data$X
Y <- data$YSequential (default, cores = 1):
Parallel (cores set to the desired
number of workers):
Both calls return identical results — cores affects only
computation time, not the estimates or any other output field.
The table below reports wall-clock times and speed-up factors measured on an Apple M-series machine (8 logical cores) using 4 parallel workers. Each scenario uses the default LASSO loss and no tuning.
| Scenario | Observations (n) | Dimensions (p) | Lambda values (nlambda) | Sequential | Parallel (4 cores) | Speed-up |
|---|---|---|---|---|---|---|
| Small | 100 | 50 | 10 | 0.68 s | 0.70 s | 1.0× |
| Medium | 150 | 100 | 15 | 3.89 s | 1.76 s | 2.2× |
| Large | 200 | 150 | 20 | 22.4 s | 6.98 s | 3.2× |
| High-dim | 100 | 200 | 20 | 17.9 s | 5.71 s | 3.1× |
| Many-λ | 150 | 100 | 30 | 7.72 s | 2.77 s | 2.8× |
The code used to produce these results is provided below for reproducibility:
library(dineR)
run_bench <- function(label, n_X, p, nlambda, cores_par) {
data <- data_generator(n_X = n_X, p = p, seed = 42)
X <- data$X; Y <- data$Y
r_seq <- estimation(X, Y, nlambda = nlambda, cores = 1)
r_par <- estimation(X, Y, nlambda = nlambda, cores = cores_par)
cat(sprintf(
"[%s] seq: %.3fs | par(%d cores): %.3fs | speed-up: %.2fx\n",
label, r_seq$elapse, cores_par, r_par$elapse,
r_seq$elapse / r_par$elapse
))
}
run_bench("Small", n_X = 100, p = 50, nlambda = 10, cores_par = 4)
run_bench("Medium", n_X = 150, p = 100, nlambda = 15, cores_par = 4)
run_bench("Large", n_X = 200, p = 150, nlambda = 20, cores_par = 4)
run_bench("High-dim", n_X = 100, p = 200, nlambda = 20, cores_par = 4)
run_bench("Many-lam", n_X = 150, p = 100, nlambda = 30, cores_par = 4)Several patterns emerge from the benchmarks:
Small problems (p = 50,
nlambda = 10): parallelisation provides no benefit. The
overhead of spawning worker processes and distributing work exceeds the
per-lambda computation time. Sequential execution is preferred
here.
Medium-to-large problems (p ≥ 100
or nlambda ≥ 15): consistent 2–3× speed-ups are observed
with 4 cores. Gains are driven by both the dimensionality
(p increases the per-lambda solve time) and the number of
lambda values (nlambda increases the number of independent
tasks that can be distributed).
Speed-up scales with workload: the largest absolute savings occur for high-dimensional or fine-grained regularisation paths, where each lambda solve is computationally expensive.
A practical rule of thumb:
cores = 1 when p < 100 and
nlambda < 15.cores = min(nlambda, parallel::detectCores() - 1)
for larger problems, reserving one core for the main R process.# Recommended cores selection for larger problems
n_cores <- min(nlambda, parallel::detectCores() - 1)
result <- estimation(X, Y, nlambda = nlambda, cores = n_cores)Note that estimation will automatically reduce
cores to nlambda if more cores are requested
than there are lambda values, and will warn accordingly.