--- title: "Using ggpicrust2" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using ggpicrust2} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## Introduction `ggpicrust2` provides a practical workflow for PICRUSt2 downstream analysis: - convert KO profiles to pathway-level abundance when needed - run differential abundance analysis with multiple methods - annotate and visualize pathway-level results - inspect taxa-level contribution using PICRUSt2 per-sequence outputs - perform GSEA when pathway-set analysis is more appropriate than single-feature testing This vignette focuses on the general package workflow. For a deeper GSEA walkthrough, see the dedicated `gsea_analysis` vignette. ## Installation ```{r installation, eval = FALSE} # install.packages("devtools") devtools::install_github("cafferychen777/ggpicrust2") library(ggpicrust2) library(tibble) ``` ## One-command workflow If you want a fast end-to-end run from abundance data to annotated differential abundance output, start with `ggpicrust2()`: ```{r one-command, eval = FALSE} data("ko_abundance") data("metadata") results <- ggpicrust2( data = ko_abundance, metadata = metadata, group = "Environment", pathway = "KO", daa_method = "LinDA", ko_to_kegg = TRUE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name" ) # Access the main outputs results[[1]]$plot head(results[[1]]$results) ``` Use this route when you want a fast default analysis. Use the stepwise workflow below when you need more control over data preparation or visualization. ## Stepwise pathway workflow ### Convert KO abundance to KEGG pathway abundance ```{r ko-to-kegg, eval = FALSE} kegg_pathway_abundance <- ko2kegg_abundance(data = ko_abundance) head(kegg_pathway_abundance[, 1:3]) ``` ### Run differential abundance analysis ```{r daa, eval = FALSE} daa_results <- pathway_daa( abundance = kegg_pathway_abundance, metadata = metadata, group = "Environment", daa_method = "ALDEx2" ) head(daa_results) ``` ### Annotate pathway results ```{r annotation, eval = FALSE} annotated_daa <- pathway_annotation( pathway = "KO", daa_results_df = daa_results, ko_to_kegg = TRUE ) head(annotated_daa) ``` ### Visualize pathway-level results ```{r errorbar, eval = FALSE} pathway_errorbar( abundance = kegg_pathway_abundance, daa_results_df = annotated_daa, Group = "Environment" ) ``` ```{r heatmap-pca, eval = FALSE} sig_pathways <- annotated_daa$feature[annotated_daa$p_adjust < 0.05] if (length(sig_pathways) > 0) { pathway_heatmap( abundance = kegg_pathway_abundance[sig_pathways, , drop = FALSE], metadata = metadata, group = "Environment" ) } pathway_pca( abundance = kegg_pathway_abundance, metadata = metadata, group = "Environment" ) ``` ## Taxa contribution workflow PICRUSt2 can output per-sequence contribution files that explain which taxa are driving predicted functional shifts. `ggpicrust2` now supports a full contribution workflow. ### Read PICRUSt2 contribution files ```{r contrib-readers, eval = FALSE} # For pred_metagenome_contrib.tsv contrib_data <- read_contrib_file("pred_metagenome_contrib.tsv") # For pred_metagenome_strat.tsv strat_data <- read_strat_file("pred_metagenome_strat.tsv") ``` ### Aggregate to a taxonomic level ```{r contrib-aggregate, eval = FALSE} taxa_contrib <- aggregate_taxa_contributions( contrib_data = contrib_data, taxonomy = your_taxonomy_table, tax_level = "Genus", top_n = 10, daa_results_df = daa_results ) head(taxa_contrib) ``` `aggregate_taxa_contributions()` accepts either: - normalized contribution data from `read_contrib_file()` - stratified abundance data from `read_strat_file()` Use `daa_results_df` or `pathway_ids` when you want to focus only on pathways that were significant in your pathway-level analysis. ### Visualize taxa-level drivers ```{r contrib-plots, eval = FALSE} taxa_contribution_bar( contrib_agg = taxa_contrib, metadata = metadata, group = "Environment", facet_by = "function" ) taxa_contribution_heatmap( contrib_agg = taxa_contrib, n_functions = 20 ) ``` This step is useful when pathway-level significance is not enough and you need to identify which taxa are contributing to the change. ## GSEA workflow Use GSEA when you want pathway-set level inference from KO or EC abundance rather than testing each pathway independently. ```{r gsea, eval = FALSE} gsea_results <- pathway_gsea( abundance = ko_abundance %>% column_to_rownames("#NAME"), metadata = metadata, group = "Environment", pathway_type = "KEGG", method = "camera" ) annotated_gsea <- gsea_pathway_annotation( gsea_results = gsea_results, pathway_type = "KEGG" ) visualize_gsea( gsea_results = annotated_gsea, plot_type = "barplot", n_pathways = 15 ) ``` For a method-by-method GSEA explanation, covariate adjustment, and comparison with DAA, see the `gsea_analysis` vignette. ## Summary The package is easiest to use when you choose the shortest path that matches your question: - use `ggpicrust2()` for a fast default pathway workflow - use the stepwise DAA functions when you need more control - use the taxa contribution workflow when you need taxon-level attribution - use `pathway_gsea()` when pathway-set enrichment is the primary question