---
title: "Getting Started with spatialAtomizeR"
author: "Yunzhe Qian (Bella), Principal Investigators: Dr. Nancy Krieger and Dr. Rachel Nethery"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with spatialAtomizeR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  eval = FALSE  # Set to TRUE to run code during build (requires external data for Example 2)
)
```

## Overview

`spatialAtomizeR` implements atom-based Bayesian regression methods (ABRM) for spatial data with misaligned grids. This vignette demonstrates the basic workflow for analyzing misaligned spatial data using both simulated and real-world examples.

## Installation

```{r install, eval = FALSE}
# Install from CRAN
install.packages("spatialAtomizeR")

# Or install development version from GitHub
devtools::install_github("bellayqian/spatialAtomizeR")
```

## Load Required Packages

**Important:** Always load `nimble` before using spatialAtomizeR functions that involve model fitting.

```{r libraries, message = FALSE}
library(spatialAtomizeR)
library(nimble)  # Required for ABRM models
```

# Example 1: Simulated Data

This example demonstrates the complete workflow using simulated data.

## Step 1: Simulate Misaligned Data

Generate spatial data with misaligned grids. The function creates two non-overlapping grids (X-grid and Y-grid) and their intersection (atoms).

```{r simulate}
sim_data <- simulate_misaligned_data(
  seed = 42,
  
  # Distribution specifications for covariates
  dist_covariates_x = c('normal', 'poisson', 'binomial'),
  dist_covariates_y = c('normal', 'poisson', 'binomial'),
  dist_y = 'poisson',  # Outcome distribution
  
  # Intercepts for data generation (REQUIRED)
  x_intercepts = c(4, -1, -1),      # One per X covariate
  y_intercepts = c(4, -1, -1),      # One per Y covariate
  beta0_y = -1,                     # Outcome model intercept
  
  # Spatial correlation parameters
  x_correlation = 0.5,  # Correlation between X covariates
  y_correlation = 0.5,  # Correlation between Y covariates
  
  # True effect sizes for outcome model
  beta_x = c(-0.03, 0.1, -0.2),    # Effects of X covariates
  beta_y = c(0.03, -0.1, 0.2)      # Effects of Y covariates
)
```

The simulated data object contains:
- `gridx`: X-grid spatial features with covariates
- `gridy`: Y-grid spatial features with covariates and outcome
- `atoms`: Atom-level spatial features (intersection of grids)
- `true_params`: True parameter values used in simulation

## Step 2: Examine Simulated Data

The `simulate_misaligned_data` function returns an object of class `misaligned_data` with useful S3 methods:

```{r examine_data}
# Check the class
class(sim_data)

# Print method shows basic information
print(sim_data)

# Summary method shows more details
summary(sim_data)
```

## Step 3: Get ABRM Model Code

Retrieve the pre-compiled NIMBLE model specification:

```{r model}
model_code <- get_abrm_model()
```

## Step 4: Run ABRM Analysis

Fit the ABRM model. You must specify which covariates follow which distributions by providing their indices:

```{r run_abrm}
abrm_results <- run_abrm(
  gridx = sim_data$gridx,
  gridy = sim_data$gridy,
  atoms = sim_data$atoms,
  model_code = model_code,
  true_params = sim_data$true_params,  # Optional: for validation
  
  # Map distribution indices to positions in dist_covariates_x/y
  norm_idx_x = 1,   # 'normal' is 1st in dist_covariates_x
  pois_idx_x = 2,   # 'poisson' is 2nd
  binom_idx_x = 3,  # 'binomial' is 3rd
  norm_idx_y = 1,   # Same for Y covariates
  pois_idx_y = 2,
  binom_idx_y = 3,
  
  # Outcome distribution: 1=normal, 2=poisson, 3=binomial
  dist_y = 2,
  
  # MCMC parameters
  niter = 50000,    # Total iterations per chain
  nburnin = 30000,  # Burn-in iterations
  nchains = 2,      # Number of chains
  seed = 123
)
```

## Step 5: Examine ABRM Results

The `run_abrm` function returns an object of class `abrm` with S3 methods:

```{r examine_results}
# Check the class
class(abrm_results)

# Print method shows summary statistics
print(abrm_results)

# Summary method shows detailed parameter estimates
summary(abrm_results)

# Plot method shows MCMC diagnostics (if available)
plot(abrm_results)
```

Access vairance-covariance matrices from the MCMC posterior sample

```{r vcov}
# Get variance-covariance matrices
vcov_matrices <- vcov(abrm_results)

# Print summary
print(vcov_matrices)

# Access specific matrices
vcov_matrices$vcov_beta_x   # Variance-covariance for X-grid coefficients
vcov_matrices$vcov_beta_y   # Variance-covariance for Y-grid coefficients
vcov_matrices$vcov_beta_0   # Variance of intercept
vcov_matrices$vcov_all      # Combined matrix for all parameters

# Compute standard errors from diagonal
sqrt(diag(vcov_matrices$vcov_beta_x))

# Compute correlation matrix from covariance matrix
cov2cor(vcov_matrices$vcov_beta_y)

# Check if parameters are correlated
vcov_matrices$vcov_beta_x[1, 2]  # Covariance between first two X coefficients
```

Access specific results:

```{r access_results}
# Parameter estimates table
abrm_results$parameter_estimates

# MCMC convergence diagnostics
abrm_results$mcmc_results$convergence

# Full MCMC samples
# abrm_results$mcmc_results$samples
```

# Example 2: Real Data - Utah Counties and School Districts

This example demonstrates using real spatial data from Utah, matching the analysis in the manuscript.

## Load Required Packages for Spatial Data

```{r load_spatial_packages, message = FALSE}
library(tigris)    # For US Census shapefiles
library(sf)        # Spatial features
library(sp)        # Spatial objects
library(spdep)     # Spatial dependence
library(raster)    # Spatial data manipulation
library(dplyr)     # Data manipulation
library(ggplot2)   # Plotting
```

## Step 1: Load Utah Shapefiles

```{r load_utah_data}
set.seed(500)

# Load Utah counties (Y-grid)
cnty <- counties(state = 'UT')
gridy <- cnty %>%
  rename(ID = GEOID) %>%
  mutate(ID = as.numeric(ID))  # ID must be numeric

# Load Utah school districts (X-grid)
scd <- school_districts(state = 'UT')
gridx <- scd %>%
  rename(ID = GEOID) %>%
  mutate(ID = as.numeric(ID))
```

## Step 2: Visualize Spatial Misalignment

```{r plot_misalignment, fig.width = 7, fig.height = 5}
ggplot() +
  geom_sf(data = gridx, fill = NA, color = "orange", linewidth = 1.2) +
  geom_sf(data = gridy, fill = NA, color = "blue", 
          linetype = 'dashed', linewidth = 0.5) +
  labs(title = "Spatial Misalignment in Utah",
       subtitle = "Orange: School Districts | Blue: Counties") +
  theme_void()
```

## Step 3: Create Atoms by Intersecting Grids

```{r create_atoms}
# Intersect the two grids to create atoms
atoms <- raster::intersect(as(gridy, 'Spatial'), as(gridx, 'Spatial'))
atoms <- sf::st_as_sf(atoms)

# Rename ID variables to match expected names
atoms <- atoms %>%
  rename(ID_y = ID_1,    # Y-grid (county) ID
         ID_x = ID_2)     # X-grid (school district) ID
```

## Step 4: Add Atom-Level Population Counts

ABRM requires population counts at the atom level. Here we use LandScan gridded population data:

```{r load_population}
# NOTE: You must download LandScan data separately
# Available at: https://landscan.ornl.gov/
# This example assumes the file is in your working directory

pop_raster <- raster("landscan-global-2022.tif")

# Extract population for each atom
pop_atoms <- raster::extract(
  pop_raster, 
  st_transform(atoms, crs(pop_raster)),
  fun = sum, 
  na.rm = TRUE
)

atoms$population <- pop_atoms
```

## Step 5: Generate Semi-Synthetic Outcome and Covariates

For demonstration, we generate synthetic outcome and covariate data at the atom level:

```{r generate_synthetic_data}
# Create atom-level spatial adjacency matrix
W_a <- nb2mat(
  poly2nb(as(atoms, "Spatial"), queen = TRUE),
  style = "B", 
  zero.policy = TRUE
)

# Generate spatial random effects
atoms <- atoms %>%
  mutate(
    re_x = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8),
    re_z = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8),
    re_y = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8)
  )

# Generate atom-level covariates and outcome
atoms <- atoms %>%
  mutate(
    # X-grid covariate (Poisson)
    covariate_x_1 = rpois(
      n = nrow(atoms), 
      lambda = population * exp(-1 + re_x)
    ),
    # Y-grid covariate (Normal)
    covariate_y_1 = rnorm(
      n = nrow(atoms), 
      mean = population * (3 + re_z), 
      sd = 1 * population
    )
  ) %>%
  mutate(
    # Outcome (Poisson)
    y = rpois(
      n = nrow(atoms),
      lambda = population * exp(
        -5 + 
        1 * (covariate_x_1 / population) - 
        0.3 * (covariate_y_1 / population) + 
        re_y
      )
    )
  )
```

## Step 6: Aggregate to Observed Grids

In reality, we only observe aggregated data at the grid level, not at atoms:

```{r aggregate_data}
# Aggregate X covariates to X-grid
gridx[["covariate_x_1"]] <- sapply(gridx$ID, function(j) {
  atom_indices <- which(atoms$ID_x == j)
  sum(atoms[["covariate_x_1"]][atom_indices])
})

# Aggregate Y covariates to Y-grid
gridy[["covariate_y_1"]] <- sapply(gridy$ID, function(j) {
  atom_indices <- which(atoms$ID_y == j)
  sum(atoms[["covariate_y_1"]][atom_indices])
})

# Aggregate outcome to Y-grid
gridy$y <- sapply(gridy$ID, function(j) {
  atom_indices <- which(atoms$ID_y == j)
  sum(atoms$y[atom_indices])
})
```

## Step 7: Run ABRM on Real Data

```{r run_abrm_utah}
model_code <- get_abrm_model()

mcmc_results <- run_abrm(
  gridx = gridx,
  gridy = gridy,
  atoms = atoms,
  model_code = model_code,
  
  # Specify covariate distributions
  pois_idx_x = 1,  # X covariate is Poisson
  norm_idx_y = 1,  # Y covariate is Normal
  dist_y = 2,      # Outcome is Poisson
  
  # MCMC settings (increase for real analysis)
  niter = 300000,
  nburnin = 200000,
  nchains = 2,
  seed = 123
)

# View results
summary(mcmc_results)
```

# Understanding Distribution Indices

When you specify covariates, the indices correspond to their positions:

```r
dist_covariates_x = c('normal', 'poisson', 'binomial')
#                       ^1        ^2         ^3

# So you would specify:
norm_idx_x = 1    # First position
pois_idx_x = 2    # Second position  
binom_idx_x = 3   # Third position
```

## Distribution Type Codes

| Code | Distribution | String      | Use Case                |
|------|--------------|-------------|-------------------------|
| 1    | Normal       | 'normal'    | Continuous data         |
| 2    | Poisson      | 'poisson'   | Count/rate data         |
| 3    | Binomial     | 'binomial'  | Proportion/binary data  |

# Troubleshooting

## Common Errors

**Error: "could not find function 'getNimbleOption'"**
- **Solution:** Load `nimble` before calling `get_abrm_model()`: `library(nimble)`

**Error: "unused arguments (res1 = c(5, 5), res2 = c(10, 10))"**
- **Solution:** The `simulate_misaligned_data()` function does not have `res1` or `res2` parameters. Grid resolutions are fixed internally.

**Error: "gridx must contain an area ID variable named 'ID'"**
- **Solution:** Ensure your spatial dataframes have a numeric column named `ID`

# Key Parameters Reference

## Critical Parameters (Don't Omit!)

- `x_intercepts`: Intercepts for X covariates (must match length of `dist_covariates_x`)
- `y_intercepts`: Intercepts for Y covariates (must match length of `dist_covariates_y`)
- `beta0_y`: Outcome model intercept
- `beta_x`: True coefficients for X covariates (if simulating)
- `beta_y`: True coefficients for Y covariates (if simulating)

## MCMC Parameters

- `niter`: Total MCMC iterations per chain (default: 50000)
- `nburnin`: Burn-in iterations to discard (default: 30000)
- `nchains`: Number of independent chains (default: 2)
- `thin`: Thinning interval (default: 10)

## Getting Help

- Function documentation: `?simulate_misaligned_data`, `?run_abrm`
- Report issues: https://github.com/bellayqian/spatialAtomizeR/issues
- Package help: `?spatialAtomizeR`

# Citation

```{r citation, eval = FALSE}
citation("spatialAtomizeR")
```

**Reference:**
Qian, Y., Johnson, N., Krieger, N., & Nethery, R.C. (2026). spatialAtomizeR: Atom-Based Regression Models for Spatially Misaligned Data in R. R package version 0.2.6.