--- title: "Get Started with ukbflow" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Get Started with ukbflow} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Welcome to `ukbflow` **`ukbflow`** is an R package for UK Biobank analysis on the [Research Analysis Platform (RAP)](https://ukbiobank.dnanexus.com). It covers the full midstream-to-downstream pipeline — from phenotype derivation and association analysis to publication-ready figures and genetic risk scoring — designed for RAP-native UKB workflows, with local simulated data for development and testing. ## Installation ```{r install} pak::pkg_install("evanbio/ukbflow") ``` ## A Quick Taste ### Load data ```{r load-data} library(ukbflow) df <- ops_toy() # synthetic UKB-like cohort, no RAP connection needed # On RAP, replace with: # auth_login() # auth_select_project("project-XXXXXXXXXXXX") # df <- extract_pheno(c(31, 21022, 53, 20116)) |> # decode_values() |> # decode_names() ``` ### Derive a disease phenotype ```{r derive} df <- df |> derive_missing() |> # recode "Prefer not to answer" → NA derive_selfreport(name = "t2dm", regex = "diabetes", # T2DM self-report field = "noncancer") |> derive_icd10(name = "t2dm", icd10 = "E11", source = "hes") |> # T2DM from HES derive_case(name = "t2dm") |> # → t2dm_status, t2dm_date derive_followup(name = "t2dm", event_col = "t2dm_date", baseline_col = "p53_i0", # assessment centre date censor_date = as.Date("2022-06-01")) ``` ### Run an association model ```{r assoc} res <- assoc_coxph( data = df, outcome_coll = "t2dm_status", time_col = "t2dm_followup_years", exposure_col = "p21001_i0", # BMI (continuous) covariates = c("p21022", # age_at_recruitment "p31") # sex ) ``` ### Plot the results ```{r plot} # Forest plot — see vignette("plot") for full usage res_df <- as.data.frame(res) plot_forest( data = res_df, est = res_df$HR, lower = res_df$CI_lower, upper = res_df$CI_upper, ci_column = 7L # res_df has 6 cols before HR; CI graphic goes here ) # Table 1 plot_tableone( data = as.data.frame(df), vars = c("p21022", # age_at_recruitment "p31", # sex "p21001_i0"), # bmi strata = "t2dm_status" ) ``` ## Full Function Overview | Module | Key functions | Vignette | |---|---|---| | Auth | `auth_login()`, `auth_select_project()` | `vignette("auth")` | | Fetch | `fetch_ls()`, `fetch_file()`, `fetch_tree()` | `vignette("fetch")` | | Extract | `extract_pheno()`, `extract_batch()`, `extract_ls()` | `vignette("extract")` | | Job | `job_wait()`, `job_status()`, `job_result()` | `vignette("job")` | | Decode | `decode_values()`, `decode_names()` | `vignette("decode")` | | Derive | `derive_missing()`, `derive_icd10()`, `derive_case()` | `vignette("derive")` | | Survival | `derive_timing()`, `derive_age()`, `derive_followup()` | `vignette("derive-survival")` | | Assoc | `assoc_coxph()`, `assoc_logistic()`, `assoc_subgroup()` | `vignette("assoc")` | | Plot | `plot_forest()`, `plot_tableone()` | `vignette("plot")` | | GRS | `grs_check()`, `grs_score()`, `grs_validate()` | `vignette("grs")` | | Ops | `ops_setup()`, `ops_toy()`, `ops_snapshot()` | `vignette("ops")` | ## End-to-End Case Study For a complete worked example using a simulated UK Biobank cohort — covering data loading, phenotype derivation, cohort assembly, Cox regression, and publication-ready visualisation — see: `vignette("smoking_lung_cancer")` — **Smoking and Lung Cancer Risk: A Complete Analysis Workflow** ## Additional Resources - [Documentation site](https://evanbio.github.io/ukbflow/) - [GitHub](https://github.com/evanbio/ukbflow) - View all functions: `?ukbflow` or `help(package = "ukbflow")` > *"All models are wrong, but some are publishable."* > > — after George Box