Tutorial with Demo

NOTE: For better display of the taxa names and data parsing, we HIGHLY recommended the input tax_table() from the phyloseq object should not have any characters/symbols/spaces in front of the taxa names. For example, some reference database might have taxonomic level indicator in front of the taxa name, e.g. {g}__Bacteroides should be just Bacteroides under the Genus column. Please use gsub() or other text replacer to remove these characters. In addition, place make sure the column names for the taxonomic levels include Phylum, Class, Order, Family, Genus and Species.

General Note

For both Shiny applications, each sub-section has a question mark in the title which the user can click to get a description for the section:

Step 0. Required libraries

## Please load the following libraries
library(phyloseq)
library(metagMisc)
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.1.3
#> -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
#> v ggplot2 3.3.6     v purrr   0.3.4
#> v tibble  3.1.8     v dplyr   1.0.9
#> v tidyr   1.2.0     v stringr 1.4.0
#> v readr   2.1.2     v forcats 0.5.1
#> Warning: package 'ggplot2' was built under R version 4.1.3
#> Warning: package 'tibble' was built under R version 4.1.3
#> Warning: package 'tidyr' was built under R version 4.1.3
#> Warning: package 'readr' was built under R version 4.1.3
#> Warning: package 'dplyr' was built under R version 4.1.3
#> -- Conflicts ------------------------------------------ tidyverse_conflicts() --
#> x dplyr::filter()     masks stats::filter()
#> x dplyr::lag()        masks stats::lag()
#> x readr::parse_date() masks curl::parse_date()
#> x purrr::some()       masks metagMisc::some()

Step 1. Load demo data

The demo file includes part of the 16S rRNA processed data from Baxter et al., with 127 Cancer and 134 Control subjects.

# Loading the default 
data("CRC_Phyloseq")
CRC_Phyloseq = validatePhloseq(CRC_Phyloseq)

Step 2. Initiate C3NA

The goal of this step includes extracting a single phenotype object, then processing each of them individually with initiateC3NA with sparCC bootstrap, TOM similarity Matrix, and calculating module patterns information from a range of minimal numbers of taxa per module.

# Generate Cancer and Normal single phenotype phyloseq object
Cancer = phyloseq::subset_samples(physeq = CRC_Phyloseq, diagnosis == "Cancer")
Normal = phyloseq::subset_samples(physeq = CRC_Phyloseq, diagnosis == "Normal")
DT::datatable(as.matrix(phyloseq::tax_table(Normal)))

This is an example of the taxonomic table we recommended for the correct use of C3NA

The initiateC3NA() step involved with sparCC with bootstrap which is a computationally expensive procedure, especially for a recommended 1,000 iterations. The computation time can be as long as 6 days. For initial testing, please consider run at least 25 - 100 iterations which can be done under a few hours. Please check the Supplementary Results from our peper for more details on the stability of using lower number of iterations.

library(phyloseq) 
## This package loading is essential for avoiding a missing function error for taxa_are_rows
## Which is a deprecated function that cannot be loaded internally. 
Cancer_C3NA = initiateC3NA(phyloseqObj = Cancer, prevTrh = 0.1,
                           nCPUs = 12, nBootstrap = 1000,
                           nMinTotalCount = 1000, phenotype = "Normal",
                           minModuleSize = 3, maxModuleSize = 40,
                           seed = 100)
Normal_C3NA = initiateC3NA(phyloseqObj = Normal, prevTrh = 0.1,
                           nCPUs = 12, nBootstrap = 1000,
                           nMinTotalCount = 1000, phenotype = "Normal",
                           minModuleSize = 3, maxModuleSize = 40,
                           seed = 100)

Step 2.1 Load the initiateC3NA results with 1000 iterations

githubURL <- ("https://github.com/zhouLabNCSU/C3NA_ScriptsAndData/raw/main/RPackageTutorialData/Post-initiateC3NA/cancer_dada2_Cancer.rds")
Cancer_C3NA  <- readRDS(url(githubURL, method="libcurl"))
githubURL <- ("https://github.com/zhouLabNCSU/C3NA_ScriptsAndData/raw/main/RPackageTutorialData/Post-initiateC3NA/cancer_dada2_Normal.rds")
Normal_C3NA  <- readRDS(url(githubURL, method="libcurl"))

Step 3. Module Evalutaion Shiny

The shiny application will run on your default browser, e.g. Chrome, Firefox, etc.

# Shiny Evaluation
moduleEvals(C3NAObj = Cancer_C3NA)
moduleEvals(C3NAObj = Normal_C3NA)

Step 3.1 Run the Provided Script

Please replace the ‘oldC3NAObj’ and ‘newC3NAObj’ with the correct and preferred name for your study.

# Copy & Paste Generated Commands
Cancer_C3NA = getOptMods(C3NAObj = Cancer_C3NA,
                         selectedPatterns = c(3,4,5,6,7,8,9,10,11,12,13,15,16,17,18,21),
                         nModules = 15)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
Normal_C3NA = getOptMods(C3NAObj = Normal_C3NA,
                         selectedPatterns = c(3,4,5,6,7,8,9,10,11,12,13,14),
                         nModules = 20)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%

Step 4. Compare two phenotypes

This steps will include the module preservation analysis.

CancerVsNormal_C3NA = comparePhenotypes(C3NAObj_Comparison =Cancer_C3NA, 
                                        C3NAObj_Reference = Normal_C3NA,
                                        corCutoff = 0.2, fdr = 0.05, unusefulTaxa = NA, 
                                        nBootstrap = 300, verbose = FALSE)
## Remove the rdaata generated from permutation
unlink("./permutedStats-actualModules.RData",recursive=TRUE)

Step 5. Adding the C3NA Influential Taxa

CancerVsNormal_C3NA = getInfluentialTaxa(CancerVsNormal_C3NA)

Step 6. Adding Differential Abundance Resutls

Please Visit the Differenital Abundance Guide under Articles for full tutorials on how to conduct DA analysis and transfer results to C3NA results.

## Extract the Pre-calculated DA Analysis
githubURL <- ("https://github.com/zhouLabNCSU/C3NA_ScriptsAndData/raw/main/RPackageTutorialData/Post-initiateC3NA/DA_CancerVsNormal_DADA2_Results.rds")
CancerVsNormal_DA  <- readRDS(url(githubURL, 
                                  method="libcurl"))
## Adding the DA method to the two phenotypes comparison C3NA
CancerVsNormal_C3NA = addDAResults(twoPhenoC3NAObj = CancerVsNormal_C3NA, 
                                   DAResults = CancerVsNormal_DA)
#> [1] "Differential Abundance Merge Results: "
#> [1] "Number of DA Methods found: 3which includes: "
#> [1] "ANCOMBC"  "ALDEx2"   "MaAsLin2"
#> [1] "# Intersect Taxa: 66"
#> [1] "# DA Unique Taxa: 5"
#> [1] "# C3NA Unique Taxa: 309"

Step 7. Shiny Investigation of the Results

compareTwoPhenoShiny(CancerVsNormal_C3NA)

Step 8. Extract Summarized Results

output = extractResults(CancerVsNormal_C3NA)

8.1 Preivew the results - Taxa Taxa Correlations (Edges)

DT::datatable(output[["taxaTaxaCorrelations"]][1:100,])

8.2 Preivew the results - Taxa (Nodes)

DT::datatable(output[["taxaTable"]][1:50,])

Session Info

print(sessionInfo())
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.9     purrr_0.3.4    
#>  [5] readr_2.1.2     tidyr_1.2.0     tibble_3.1.8    ggplot2_3.3.6  
#>  [9] tidyverse_1.3.2 metagMisc_0.0.4 phyloseq_1.36.0 DT_0.25        
#> [13] curl_4.3.2      C3NA_1.0.0     
#> 
#> loaded via a namespace (and not attached):
#>   [1] utf8_1.2.2             shinydashboard_0.7.2   tidyselect_1.1.2      
#>   [4] RSQLite_2.2.15         AnnotationDbi_1.54.1   htmlwidgets_1.5.4     
#>   [7] grid_4.1.0             Rtsne_0.16             SpiecEasi_1.1.2       
#>  [10] munsell_0.5.0          codetools_0.2-18       ragg_1.2.2            
#>  [13] preprocessCore_1.54.0  interp_1.1-3           withr_2.5.0           
#>  [16] colorspace_2.0-3       Biobase_2.52.0         highr_0.9             
#>  [19] knitr_1.40             rstudioapi_0.14        stats4_4.1.0          
#>  [22] shinyWidgets_0.7.3     huge_1.3.5             GenomeInfoDbData_1.2.6
#>  [25] bit64_4.0.5            pheatmap_1.0.12        rhdf5_2.36.0          
#>  [28] rprojroot_2.0.3        vctrs_0.4.1            generics_0.1.3        
#>  [31] xfun_0.32              fastcluster_1.2.3      R6_2.5.1              
#>  [34] doParallel_1.0.17      GenomeInfoDb_1.28.4    VGAM_1.1-7            
#>  [37] pulsar_0.3.8           bitops_1.0-7           rhdf5filters_1.4.0    
#>  [40] cachem_1.0.6           assertthat_0.2.1       promises_1.2.0.1      
#>  [43] scales_1.2.1           nnet_7.3-17            googlesheets4_1.0.1   
#>  [46] gtable_0.3.1           WGCNA_1.71             rlang_1.0.5           
#>  [49] systemfonts_1.0.4      splines_4.1.0          lazyeval_0.2.2        
#>  [52] gargle_1.2.1           impute_1.66.0          broom_1.0.1           
#>  [55] checkmate_2.1.0        yaml_2.3.5             reshape2_1.4.4        
#>  [58] modelr_0.1.9           crosstalk_1.2.0        backports_1.4.1       
#>  [61] httpuv_1.6.5           Hmisc_4.7-0            tools_4.1.0           
#>  [64] ellipsis_0.3.2         jquerylib_0.1.4        biomformat_1.20.0     
#>  [67] RColorBrewer_1.1-3     BiocGenerics_0.38.0    dynamicTreeCut_1.63-1 
#>  [70] Rcpp_1.0.9             plyr_1.8.7             base64enc_0.1-3       
#>  [73] visNetwork_2.1.0       zlibbioc_1.38.0        RCurl_1.98-1.8        
#>  [76] rpart_4.1-15           deldir_1.0-6           S4Vectors_0.30.2      
#>  [79] haven_2.5.0            cluster_2.1.2          fs_1.5.2              
#>  [82] magrittr_2.0.3         data.table_1.14.2      reprex_2.0.2          
#>  [85] googledrive_2.0.0      matrixStats_0.62.0     randomcoloR_1.1.0.1   
#>  [88] hms_1.1.2              shinyjs_2.1.0          mime_0.12             
#>  [91] reactable_0.3.0        evaluate_0.16          xtable_1.8-4          
#>  [94] jpeg_0.1-9             readxl_1.4.0           IRanges_2.26.0        
#>  [97] gridExtra_2.3          shape_1.4.6            compiler_4.1.0        
#> [100] V8_4.2.1               crayon_1.5.1           htmltools_0.5.3       
#> [103] mgcv_1.8-35            later_1.3.0            tzdb_0.3.0            
#> [106] Formula_1.2-4          lubridate_1.8.0        DBI_1.1.3             
#> [109] dbplyr_2.2.1           MASS_7.3-58.1          Matrix_1.5-1          
#> [112] ade4_1.7-19            permute_0.9-7          cli_3.3.0             
#> [115] parallel_4.1.0         igraph_1.3.4           pkgconfig_2.0.3       
#> [118] pkgdown_2.0.6          foreign_0.8-81         plotly_4.10.0         
#> [121] xml2_1.3.3             foreach_1.5.2          bslib_0.4.0           
#> [124] multtest_2.48.0        XVector_0.32.0         rvest_1.0.3           
#> [127] digest_0.6.29          vegan_2.6-2            Biostrings_2.60.2     
#> [130] rmarkdown_2.16         cellranger_1.1.0       htmlTable_2.4.1       
#> [133] shiny_1.7.2            lifecycle_1.0.2        nlme_3.1-152          
#> [136] jsonlite_1.8.0         Rhdf5lib_1.14.2        desc_1.4.2            
#> [139] viridisLite_0.4.1      fansi_1.0.3            pillar_1.8.1          
#> [142] lattice_0.20-44        KEGGREST_1.32.0        fastmap_1.1.0         
#> [145] httr_1.4.4             survival_3.2-11        GO.db_3.13.0          
#> [148] glue_1.6.2             png_0.1-7              iterators_1.0.14      
#> [151] glmnet_4.1-4           bit_4.0.4              stringi_1.7.8         
#> [154] sass_0.4.2             blob_1.2.3             textshaping_0.3.6     
#> [157] latticeExtra_0.6-30    memoise_2.0.1          ape_5.6-2

Kuncheng Song

3/16/2022