--- title: "Getting Started with spatialAtomizeR" author: "Yunzhe Qian (Bella), Principal Investigators: Dr. Nancy Krieger and Dr. Rachel Nethery" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with spatialAtomizeR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, eval = FALSE # Set to TRUE to run code during build (requires external data for Example 2) ) ``` ## Overview `spatialAtomizeR` implements atom-based Bayesian regression methods (ABRM) for spatial data with misaligned grids. This vignette demonstrates the basic workflow for analyzing misaligned spatial data using both simulated and real-world examples. ## Installation ```{r install, eval = FALSE} # Install from CRAN install.packages("spatialAtomizeR") # Or install development version from GitHub devtools::install_github("bellayqian/spatialAtomizeR") ``` ## Load Required Packages **Important:** Always load `nimble` before using spatialAtomizeR functions that involve model fitting. ```{r libraries, message = FALSE} library(spatialAtomizeR) library(nimble) # Required for ABRM models ``` # Example 1: Simulated Data This example demonstrates the complete workflow using simulated data. ## Step 1: Simulate Misaligned Data Generate spatial data with misaligned grids. The function creates two non-overlapping grids (X-grid and Y-grid) and their intersection (atoms). ```{r simulate} sim_data <- simulate_misaligned_data( seed = 42, # Distribution specifications for covariates dist_covariates_x = c('normal', 'poisson', 'binomial'), dist_covariates_y = c('normal', 'poisson', 'binomial'), dist_y = 'poisson', # Outcome distribution # Intercepts for data generation (REQUIRED) x_intercepts = c(4, -1, -1), # One per X covariate y_intercepts = c(4, -1, -1), # One per Y covariate beta0_y = -1, # Outcome model intercept # Spatial correlation parameters x_correlation = 0.5, # Correlation between X covariates y_correlation = 0.5, # Correlation between Y covariates # True effect sizes for outcome model beta_x = c(-0.03, 0.1, -0.2), # Effects of X covariates beta_y = c(0.03, -0.1, 0.2) # Effects of Y covariates ) ``` The simulated data object contains: - `gridx`: X-grid spatial features with covariates - `gridy`: Y-grid spatial features with covariates and outcome - `atoms`: Atom-level spatial features (intersection of grids) - `true_params`: True parameter values used in simulation ## Step 2: Examine Simulated Data The `simulate_misaligned_data` function returns an object of class `misaligned_data` with useful S3 methods: ```{r examine_data} # Check the class class(sim_data) # Print method shows basic information print(sim_data) # Summary method shows more details summary(sim_data) ``` ## Step 3: Get ABRM Model Code Retrieve the pre-compiled NIMBLE model specification: ```{r model} model_code <- get_abrm_model() ``` ## Step 4: Run ABRM Analysis Fit the ABRM model. You must specify which covariates follow which distributions by providing their indices: ```{r run_abrm} abrm_results <- run_abrm( gridx = sim_data$gridx, gridy = sim_data$gridy, atoms = sim_data$atoms, model_code = model_code, true_params = sim_data$true_params, # Optional: for validation # Map distribution indices to positions in dist_covariates_x/y norm_idx_x = 1, # 'normal' is 1st in dist_covariates_x pois_idx_x = 2, # 'poisson' is 2nd binom_idx_x = 3, # 'binomial' is 3rd norm_idx_y = 1, # Same for Y covariates pois_idx_y = 2, binom_idx_y = 3, # Outcome distribution: 1=normal, 2=poisson, 3=binomial dist_y = 2, # MCMC parameters niter = 50000, # Total iterations per chain nburnin = 30000, # Burn-in iterations nchains = 2, # Number of chains seed = 123 ) ``` ## Step 5: Examine ABRM Results The `run_abrm` function returns an object of class `abrm` with S3 methods: ```{r examine_results} # Check the class class(abrm_results) # Print method shows summary statistics print(abrm_results) # Summary method shows detailed parameter estimates summary(abrm_results) # Plot method shows MCMC diagnostics (if available) plot(abrm_results) ``` Access vairance-covariance matrices from the MCMC posterior sample ```{r vcov} # Get variance-covariance matrices vcov_matrices <- vcov(abrm_results) # Print summary print(vcov_matrices) # Access specific matrices vcov_matrices$vcov_beta_x # Variance-covariance for X-grid coefficients vcov_matrices$vcov_beta_y # Variance-covariance for Y-grid coefficients vcov_matrices$vcov_beta_0 # Variance of intercept vcov_matrices$vcov_all # Combined matrix for all parameters # Compute standard errors from diagonal sqrt(diag(vcov_matrices$vcov_beta_x)) # Compute correlation matrix from covariance matrix cov2cor(vcov_matrices$vcov_beta_y) # Check if parameters are correlated vcov_matrices$vcov_beta_x[1, 2] # Covariance between first two X coefficients ``` Access specific results: ```{r access_results} # Parameter estimates table abrm_results$parameter_estimates # MCMC convergence diagnostics abrm_results$mcmc_results$convergence # Full MCMC samples # abrm_results$mcmc_results$samples ``` # Example 2: Real Data - Utah Counties and School Districts This example demonstrates using real spatial data from Utah, matching the analysis in the manuscript. ## Load Required Packages for Spatial Data ```{r load_spatial_packages, message = FALSE} library(tigris) # For US Census shapefiles library(sf) # Spatial features library(sp) # Spatial objects library(spdep) # Spatial dependence library(raster) # Spatial data manipulation library(dplyr) # Data manipulation library(ggplot2) # Plotting ``` ## Step 1: Load Utah Shapefiles ```{r load_utah_data} set.seed(500) # Load Utah counties (Y-grid) cnty <- counties(state = 'UT') gridy <- cnty %>% rename(ID = GEOID) %>% mutate(ID = as.numeric(ID)) # ID must be numeric # Load Utah school districts (X-grid) scd <- school_districts(state = 'UT') gridx <- scd %>% rename(ID = GEOID) %>% mutate(ID = as.numeric(ID)) ``` ## Step 2: Visualize Spatial Misalignment ```{r plot_misalignment, fig.width = 7, fig.height = 5} ggplot() + geom_sf(data = gridx, fill = NA, color = "orange", linewidth = 1.2) + geom_sf(data = gridy, fill = NA, color = "blue", linetype = 'dashed', linewidth = 0.5) + labs(title = "Spatial Misalignment in Utah", subtitle = "Orange: School Districts | Blue: Counties") + theme_void() ``` ## Step 3: Create Atoms by Intersecting Grids ```{r create_atoms} # Intersect the two grids to create atoms atoms <- raster::intersect(as(gridy, 'Spatial'), as(gridx, 'Spatial')) atoms <- sf::st_as_sf(atoms) # Rename ID variables to match expected names atoms <- atoms %>% rename(ID_y = ID_1, # Y-grid (county) ID ID_x = ID_2) # X-grid (school district) ID ``` ## Step 4: Add Atom-Level Population Counts ABRM requires population counts at the atom level. Here we use LandScan gridded population data: ```{r load_population} # NOTE: You must download LandScan data separately # Available at: https://landscan.ornl.gov/ # This example assumes the file is in your working directory pop_raster <- raster("landscan-global-2022.tif") # Extract population for each atom pop_atoms <- raster::extract( pop_raster, st_transform(atoms, crs(pop_raster)), fun = sum, na.rm = TRUE ) atoms$population <- pop_atoms ``` ## Step 5: Generate Semi-Synthetic Outcome and Covariates For demonstration, we generate synthetic outcome and covariate data at the atom level: ```{r generate_synthetic_data} # Create atom-level spatial adjacency matrix W_a <- nb2mat( poly2nb(as(atoms, "Spatial"), queen = TRUE), style = "B", zero.policy = TRUE ) # Generate spatial random effects atoms <- atoms %>% mutate( re_x = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8), re_z = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8), re_y = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8) ) # Generate atom-level covariates and outcome atoms <- atoms %>% mutate( # X-grid covariate (Poisson) covariate_x_1 = rpois( n = nrow(atoms), lambda = population * exp(-1 + re_x) ), # Y-grid covariate (Normal) covariate_y_1 = rnorm( n = nrow(atoms), mean = population * (3 + re_z), sd = 1 * population ) ) %>% mutate( # Outcome (Poisson) y = rpois( n = nrow(atoms), lambda = population * exp( -5 + 1 * (covariate_x_1 / population) - 0.3 * (covariate_y_1 / population) + re_y ) ) ) ``` ## Step 6: Aggregate to Observed Grids In reality, we only observe aggregated data at the grid level, not at atoms: ```{r aggregate_data} # Aggregate X covariates to X-grid gridx[["covariate_x_1"]] <- sapply(gridx$ID, function(j) { atom_indices <- which(atoms$ID_x == j) sum(atoms[["covariate_x_1"]][atom_indices]) }) # Aggregate Y covariates to Y-grid gridy[["covariate_y_1"]] <- sapply(gridy$ID, function(j) { atom_indices <- which(atoms$ID_y == j) sum(atoms[["covariate_y_1"]][atom_indices]) }) # Aggregate outcome to Y-grid gridy$y <- sapply(gridy$ID, function(j) { atom_indices <- which(atoms$ID_y == j) sum(atoms$y[atom_indices]) }) ``` ## Step 7: Run ABRM on Real Data ```{r run_abrm_utah} model_code <- get_abrm_model() mcmc_results <- run_abrm( gridx = gridx, gridy = gridy, atoms = atoms, model_code = model_code, # Specify covariate distributions pois_idx_x = 1, # X covariate is Poisson norm_idx_y = 1, # Y covariate is Normal dist_y = 2, # Outcome is Poisson # MCMC settings (increase for real analysis) niter = 300000, nburnin = 200000, nchains = 2, seed = 123 ) # View results summary(mcmc_results) ``` # Understanding Distribution Indices When you specify covariates, the indices correspond to their positions: ```r dist_covariates_x = c('normal', 'poisson', 'binomial') # ^1 ^2 ^3 # So you would specify: norm_idx_x = 1 # First position pois_idx_x = 2 # Second position binom_idx_x = 3 # Third position ``` ## Distribution Type Codes | Code | Distribution | String | Use Case | |------|--------------|-------------|-------------------------| | 1 | Normal | 'normal' | Continuous data | | 2 | Poisson | 'poisson' | Count/rate data | | 3 | Binomial | 'binomial' | Proportion/binary data | # Troubleshooting ## Common Errors **Error: "could not find function 'getNimbleOption'"** - **Solution:** Load `nimble` before calling `get_abrm_model()`: `library(nimble)` **Error: "unused arguments (res1 = c(5, 5), res2 = c(10, 10))"** - **Solution:** The `simulate_misaligned_data()` function does not have `res1` or `res2` parameters. Grid resolutions are fixed internally. **Error: "gridx must contain an area ID variable named 'ID'"** - **Solution:** Ensure your spatial dataframes have a numeric column named `ID` # Key Parameters Reference ## Critical Parameters (Don't Omit!) - `x_intercepts`: Intercepts for X covariates (must match length of `dist_covariates_x`) - `y_intercepts`: Intercepts for Y covariates (must match length of `dist_covariates_y`) - `beta0_y`: Outcome model intercept - `beta_x`: True coefficients for X covariates (if simulating) - `beta_y`: True coefficients for Y covariates (if simulating) ## MCMC Parameters - `niter`: Total MCMC iterations per chain (default: 50000) - `nburnin`: Burn-in iterations to discard (default: 30000) - `nchains`: Number of independent chains (default: 2) - `thin`: Thinning interval (default: 10) ## Getting Help - Function documentation: `?simulate_misaligned_data`, `?run_abrm` - Report issues: https://github.com/bellayqian/spatialAtomizeR/issues - Package help: `?spatialAtomizeR` # Citation ```{r citation, eval = FALSE} citation("spatialAtomizeR") ``` **Reference:** Qian, Y., Johnson, N., Krieger, N., & Nethery, R.C. (2026). spatialAtomizeR: Atom-Based Regression Models for Spatially Misaligned Data in R. R package version 0.2.6.