| Title: | Multiple Imputation with 'MIDAS2' Denoising Autoencoders |
| Version: | 0.1.1 |
| Description: | Fits 'MIDAS' denoising autoencoder models for multiple imputation of missing data, generates multiply-imputed datasets, computes imputation means, and runs Rubin's rules regression analysis. Wraps the 'MIDAS2' 'Python' engine via a local 'FastAPI' server over 'HTTP', so no 'reticulate' dependency is needed at runtime. Methods are described in Lall and Robinson (2022) <doi:10.1017/pan.2020.49> and Lall and Robinson (2023) <doi:10.18637/jss.v107.i09>. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/MIDASverse/MIDAS2 |
| BugReports: | https://github.com/MIDASverse/MIDAS2/issues |
| Depends: | R (≥ 4.1.0) |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| SystemRequirements: | Python (>= 3.9) with the 'midasverse-midas-api' package |
| Imports: | curl, httr2 (≥ 1.0.0), processx (≥ 3.8.0), rlang (≥ 1.1.0) |
| Suggests: | arrow, jsonlite, reticulate, testthat (≥ 3.0.0), knitr, rmarkdown |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-03-08 10:22:49 UTC; t.robinson7 |
| Author: | Thomas Robinson [aut, cre], Ranjit Lall [aut] |
| Maintainer: | Thomas Robinson <t.robinson7@lse.ac.uk> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-12 08:30:08 UTC |
rMIDAS2: Multiple Imputation with 'MIDAS2' Denoising Autoencoders
Description
Fits 'MIDAS' denoising autoencoder models for multiple imputation of missing data, generates multiply-imputed datasets, computes imputation means, and runs Rubin's rules regression analysis. Wraps the 'MIDAS2' 'Python' engine via a local 'FastAPI' server over 'HTTP', so no 'reticulate' dependency is needed at runtime. Methods are described in Lall and Robinson (2022) doi:10.1017/pan.2020.49 and Lall and Robinson (2023) doi:10.18637/jss.v107.i09.
Author(s)
Maintainer: Thomas Robinson t.robinson7@lse.ac.uk
Authors:
Ranjit Lall ranjit.lall@politics.ox.ac.uk
See Also
Useful links:
Build a base request pointing at the running server
Description
Build a base request pointing at the running server
Usage
base_req(path)
Arguments
path |
API path (e.g. "/fit"). |
Value
An httr2 request object.
Check whether the installed backend is up-to-date with PyPI
Description
Compares the locally installed version of midasverse-midas-api against
the latest release on PyPI.
Runs silently on success; emits a message when an update is available.
Failures (e.g. no network) are silently ignored.
Usage
check_backend_version(python, package = "midasverse-midas-api")
Arguments
python |
Path to the Python interpreter. |
package |
PyPI package name (default |
Value
No return value, called for side effects.
Remove the saved virtualenv path
Description
Remove the saved virtualenv path
Usage
clear_venv_path()
Value
No return value, called for side effects.
Combine results using Rubin's rules
Description
Runs a GLM across all stored imputations and combines the results using Rubin's combination rules for multiple imputation inference.
Usage
combine(
model_id,
y,
ind_vars = NULL,
dof_adjust = TRUE,
incl_constant = TRUE,
...
)
Arguments
model_id |
A character model ID, or a fitted model object (list with
a |
y |
Character. Name of the outcome variable. |
ind_vars |
Character vector of independent variable names, or |
dof_adjust |
Logical. Apply Barnard-Rubin degrees-of-freedom
adjustment (default |
incl_constant |
Logical. Include an intercept (default |
... |
Arguments forwarded to |
Value
A data frame with columns term, estimate, std.error,
statistic, df, and p.value.
Examples
## Not run:
df <- data.frame(Y = rnorm(200), X1 = rnorm(200), X2 = rnorm(200))
df$X1[sample(200, 40)] <- NA
fit <- midas_fit(df, epochs = 10L)
midas_transform(fit, m = 10)
results <- combine(fit, y = "Y")
results
## End(Not run)
Path to the package config directory
Description
Path to the package config directory
Usage
config_dir()
Value
Character path to the config directory.
Ensure the server is running
Description
Starts the server if it is not already running. Called internally by every client function so users never have to manage the server manually.
Usage
ensure_server(...)
Arguments
... |
Arguments forwarded to |
Value
Invisibly returns the base URL of the running server.
Examples
## Not run:
ensure_server()
## End(Not run)
Extract model ID from a string or fitted model object
Description
Accepts either a bare character model ID or a list with a $model_id
element (as returned by midas_fit() or midas()).
Usage
extract_model_id(x)
Arguments
x |
A character string or a list with a |
Value
Character model ID.
Find a free TCP port
Description
Samples random ports in the dynamic range and uses serverSocket() to
verify availability.
Usage
find_free_port()
Value
Integer port number.
GET and return parsed body
Description
GET and return parsed body
Usage
get_json(path, timeout = 60)
Arguments
path |
API path. |
timeout |
Request timeout in seconds. |
Value
Parsed JSON response as a list.
Compute mean imputation
Description
Calculates the element-wise mean across all stored imputations for a model.
Usage
imp_mean(model_id, ...)
Arguments
model_id |
A character model ID, or a fitted model object (list with
a |
... |
Arguments forwarded to |
Value
A data frame with the mean imputed values.
Examples
## Not run:
df <- data.frame(X1 = rnorm(200), X2 = rnorm(200))
df$X1[sample(200, 40)] <- NA
fit <- midas_fit(df, epochs = 10L)
midas_transform(fit, m = 10)
mean_df <- imp_mean(fit)
## End(Not run)
Install the MIDAS2 Python backend
Description
Creates an isolated Python environment and installs the midasverse-midas-api
package (which pulls in midasverse-midas as a dependency).
Usage
install_backend(
method = c("pip", "conda", "uv"),
envname = "midas2_env",
package = "midasverse-midas-api"
)
Arguments
method |
Character. One of |
envname |
Character. Name of the virtual environment to create
(default |
package |
Character. Package specifier to install
(default |
Details
This is the only function in the package that uses reticulate, and
only for environment creation. It is never used at runtime.
Value
No return value, called for side effects.
Examples
## Not run:
install_backend()
install_backend(method = "conda")
## End(Not run)
Load the saved virtualenv path (or NULL)
Description
Load the saved virtualenv path (or NULL)
Usage
load_venv_path()
Value
Character path or NULL.
Multiple imputation (all-in-one)
Description
Convenience function that fits a MIDAS model and generates imputations
in a single call. Equivalent to calling midas_fit() followed by
midas_transform().
Usage
midas(
data,
m = 5L,
hidden_layers = c(256L, 128L, 64L),
dropout_prob = 0.5,
epochs = 75L,
batch_size = 64L,
lr = 0.001,
corrupt_rate = 0.8,
num_adj = 1,
cat_adj = 1,
bin_adj = 1,
pos_adj = 1,
omit_first = FALSE,
seed = 89L,
...
)
Arguments
data |
A data frame (may contain |
m |
Integer. Number of imputations (default 5). |
|
Integer vector of hidden layer sizes
(default | |
dropout_prob |
Numeric. Dropout probability (default 0.5). |
epochs |
Integer. Number of training epochs (default 75). |
batch_size |
Integer. Mini-batch size (default 64). |
lr |
Numeric. Learning rate (default 0.001). |
corrupt_rate |
Numeric. Corruption rate for denoising (default 0.8). |
num_adj |
Numeric. Loss multiplier for numeric columns (default 1). |
cat_adj |
Numeric. Loss multiplier for categorical columns (default 1). |
bin_adj |
Numeric. Loss multiplier for binary columns (default 1). |
pos_adj |
Numeric. Loss multiplier for positive columns (default 1). |
omit_first |
Logical. Omit first column from encoder input
(default |
seed |
Integer. Random seed (default 89). |
... |
Arguments forwarded to |
Value
A list with model_id and imputations (a list of data frames).
Examples
## Not run:
df <- data.frame(X1 = rnorm(200), X2 = rnorm(200))
df$X1[sample(200, 40)] <- NA
result <- midas(df, m = 5, epochs = 10)
head(result$imputations[[1]])
## End(Not run)
Fit a MIDAS model
Description
Sends data to the server and fits a MIDAS denoising autoencoder.
Usage
midas_fit(
data,
hidden_layers = c(256L, 128L, 64L),
dropout_prob = 0.5,
epochs = 75L,
batch_size = 64L,
lr = 0.001,
corrupt_rate = 0.8,
num_adj = 1,
cat_adj = 1,
bin_adj = 1,
pos_adj = 1,
omit_first = FALSE,
seed = 89L,
...
)
Arguments
data |
A data frame (may contain |
|
Integer vector of hidden layer sizes
(default | |
dropout_prob |
Numeric. Dropout probability (default 0.5). |
epochs |
Integer. Number of training epochs (default 75). |
batch_size |
Integer. Mini-batch size (default 64). |
lr |
Numeric. Learning rate (default 0.001). |
corrupt_rate |
Numeric. Corruption rate for denoising (default 0.8). |
num_adj |
Numeric. Loss multiplier for numeric columns (default 1). |
cat_adj |
Numeric. Loss multiplier for categorical columns (default 1). |
bin_adj |
Numeric. Loss multiplier for binary columns (default 1). |
pos_adj |
Numeric. Loss multiplier for positive columns (default 1). |
omit_first |
Logical. Omit first column from encoder input
(default |
seed |
Integer. Random seed (default 89). |
... |
Arguments forwarded to |
Value
A list with model_id, n_rows, n_cols, col_types.
Examples
## Not run:
df <- data.frame(X1 = rnorm(200), X2 = rnorm(200), X3 = rnorm(200))
df$X2[sample(200, 40)] <- NA
fit <- midas_fit(df, epochs = 10L)
fit$model_id
## End(Not run)
Generate multiple imputations
Description
Generates m imputed datasets from a fitted MIDAS model.
Usage
midas_transform(model_id, m = 5L, ...)
Arguments
model_id |
A character model ID, or a fitted model object (list with
a |
m |
Integer. Number of imputations (default 5). |
... |
Arguments forwarded to |
Value
A list of m data frames, each with imputed values.
Examples
## Not run:
df <- data.frame(X1 = rnorm(200), X2 = rnorm(200))
df$X1[sample(200, 40)] <- NA
fit <- midas_fit(df, epochs = 10L)
imps <- midas_transform(fit, m = 10)
head(imps[[1]])
## End(Not run)
Overimputation diagnostic
Description
Masks a fraction of observed values, re-imputes them, and computes RMSE to assess imputation quality.
Usage
overimpute(model_id, mask_frac = 0.1, m = 5L, seed = NULL, ...)
Arguments
model_id |
A character model ID, or a fitted model object (list with
a |
mask_frac |
Numeric. Fraction of observed values to mask (default 0.1). |
m |
Integer. Number of imputations for the diagnostic (default 5). |
seed |
Integer or |
... |
Arguments forwarded to |
Value
A list with rmse (named numeric vector) and mean_rmse.
Examples
## Not run:
df <- data.frame(X1 = rnorm(200), X2 = rnorm(200))
df$X1[sample(200, 40)] <- NA
fit <- midas_fit(df, epochs = 10L)
diag <- overimpute(fit, mask_frac = 0.1)
diag$mean_rmse
## End(Not run)
Parse a JSON table response into a data.frame
Description
Parse a JSON table response into a data.frame
Usage
parse_table(res)
Arguments
res |
List with |
Value
A data frame.
POST JSON and return parsed body
Description
POST JSON and return parsed body
Usage
post_json(path, body, timeout = 600)
Arguments
path |
API path. |
body |
List to send as JSON. |
timeout |
Request timeout in seconds. |
Value
Parsed JSON response as a list.
Save the virtualenv path to persistent config
Description
Save the virtualenv path to persistent config
Usage
save_venv_path(path)
Arguments
path |
Character path to save. |
Value
No return value, called for side effects.
Start the MIDAS2 API server
Description
Launches python -m midas2_api as a background process and waits for the
/health endpoint to respond.
Usage
start_server(python = "python3", port = NULL, venv = NULL, max_wait = 120L)
Arguments
python |
Path to the Python interpreter (default |
port |
Port to bind to. If |
venv |
Path to a Python virtual environment.
If supplied, the interpreter is taken from |
max_wait |
Maximum number of 0.5-second polling attempts (default 120, i.e. 60 seconds). The first launch may be slower due to Python import caching. |
Value
Invisibly returns the port number.
Examples
## Not run:
start_server()
start_server(venv = "~/.virtualenvs/midas2_env")
## End(Not run)
Stop the MIDAS2 API server
Description
Kills the background Python process and clears the internal state.
Usage
stop_server()
Value
No return value, called for side effects.
Examples
## Not run:
stop_server()
## End(Not run)
Convert an R matrix / data.frame to a nested list suitable for JSON
Description
Convert an R matrix / data.frame to a nested list suitable for JSON
Usage
to_nested_list(x)
Arguments
x |
A matrix or data frame. |
Value
A nested list of rows.
Uninstall the MIDAS2 Python backend
Description
Stops the running server (if any), removes the Python environment created by
install_backend(), and clears the saved configuration.
Usage
uninstall_backend(method = c("pip", "conda", "uv"), envname = "midas2_env")
Arguments
method |
Character. One of |
envname |
Character. Name of the virtual environment to remove
(default |
Value
No return value, called for side effects.
Examples
## Not run:
uninstall_backend()
uninstall_backend(method = "conda")
## End(Not run)
Update the MIDAS2 Python backend
Description
Upgrades the midasverse-midas-api package (and its dependencies) in the
existing Python environment. Stops the running server first so that the
new version is loaded on next use.
Usage
update_backend(
method = c("pip", "conda", "uv"),
envname = "midas2_env",
package = "midasverse-midas-api"
)
Arguments
method |
Character. One of |
envname |
Character. Name of the virtual environment
(default |
package |
Character. Package specifier to upgrade
(default |
Value
No return value, called for side effects.
Examples
## Not run:
update_backend()
## End(Not run)