Welcome to the PolySTest vignette.
PolySTest is a comprehensive R package designed for the statistical testing of quantitative omics data, including genomics, proteomics, and metabolomics datasets. By integrating advanced statistical methods, PolySTest addresses the challenges of missing features and high-dimensionality with multiple comparisons normal in omics data.
This package facilitates the identification of differentially abundant features across different conditions or treatments, making it an essential tool for biomarker discovery and biological insight. Additionally, its robust visualization functions enable users with multiple views for further biological interpretations and comparison of the different statistical tests.
To install PolySTest and its dependencies, you can use Bioconductor’s BiocManager.
PolySTest requires quantitative omics data to be formatted as a SummarizedExperiment
object, which integrates the experimental data with its corresponding metadata. Here’s how to prepare your data:
Data Format: Your quantitative data should be in a matrix format, with features (e.g., genes, proteins) as rows and samples as columns. Missing values can be present, but they should be represented as NA. In case of high missingness, it can be worth to remove features with only minimal non-missing values.
Sample Metadata: Accompany your data with metadata that describes each sample, including conditions, replicates, and possible other experimental factors. This metadata will be crucial for defining comparisons in your statistical tests.
Normalization: To correct for technical variability and make your data comparable across samples, apply normalization methods suitable for your data type. For example, median normalization can adjust for differences in loading amounts or sequencing depth:
First, we load the necessary package and example data and reduce it to 200 lines to increase processing speed. The data set consists of protein abundances in liver samples from mice fed with four different diets. The data contains three replicates per diet.
library(PolySTest)
library(SummarizedExperiment)
# Load the data file "LiverAllProteins.csv" from the PolySTest package
filename <- system.file("extdata", "LiverAllProteins.csv",
package = "PolySTest")
dat <- read.csv(filename, row.names=1, stringsAsFactors = FALSE)
dat <- dat[1:200,] # Use a subset of the data for this example
The following line of code normalizes the data by subtracting the median of each column from the data. This is a simple example of normalization, and you should use the appropriate method for your data type.
Now, create a SummarizedExperiment Object. Integrate your normalized data and metadata into a SummarizedExperiment object. This structured format ensures that PolySTest can efficiently process your data:
sampleMetadata <- data.frame(Condition = rep(paste("Condition", 1:4), each=3),
Replicate = rep(1:3, each=4))
fulldata <- SummarizedExperiment(assays = list(quant = dat),
colData = sampleMetadata)
rowData(fulldata) <- rownames(dat)
metadata(fulldata) <- list(NumReps = 3, NumCond = 4)
assay(fulldata, "quant") <- dat
fulldata <- update_conditions_with_lcs(fulldata)
Before running statistical tests, it’s essential to define the comparisons between conditions or treatments of interest.
When selecting comparisons, consider the scientific questions you aim to answer. For instance, if you’re exploring the effect of a specific treatment, compare treated samples against controls. The choice of reference condition can impact the interpretation of your results.
PolySTest facilitates the creation of pairwise comparisons through the create_pairwise_comparisons
function. Here’s how to define comparisons between each condition and a reference condition:
PolySTest offers to conduct both paired and unpaired statistical tests:
Paired Tests: Use PolySTest_paired
when your data involves matched samples, such as before-and-after treatments on the same subjects. Paired tests account for the inherent correlations between matched samples, offering more sensitivity in detecting differences.
Unpaired Tests: Use PolySTest_unpaired
for independent groups of samples, such as comparing different treatment groups without matching.
PolySTest automatically performs false discovery rate (FDR) correction on p-values from the statistical tests limma, rank products, Miss Test, permutation test and t-test, and all tests but the t-test combined to the PolySTest FDR
fulldata_unpaired <- PolySTest_unpaired(fulldata, allComps)
#> Running limma test
#> limma completed
#> Running Miss_Test test
#> Running Miss test ...
#> ================================================================================
#> Miss_Test completed
#> ================================================================================
#> tests completed
#> Unifying q-values across tests ...
#> Calculating PolySTest FDRs ...
#> ------- Summary of Results --------
#> Number of differentially regulated features with FDR < 0.01:
#> | | PolySTest| limma| Miss_Test| rank_products| permutation_test| t_test|
#> |:----------------------|---------:|-----:|---------:|-------------:|----------------:|------:|
#> |TTA.Rep._vs_HF.Rep. | 34| 49| 7| 3| 0| 16|
#> |FO.Rep._vs_HF.Rep. | 1| 3| 2| 0| 0| 0|
#> |TTA.FO.Rep._vs_HF.Rep. | 33| 50| 4| 5| 0| 3|
PolySTest includes several visualizations to help interpret the results of your statistical tests. These visualizations not only highlight significant features but also provide insights into the overall distribution of the data and the relationships between different tests:
P-value Distributions: Understanding the distribution of p-values can help assess the tests’ sensitivity and the presence of potential biological signals in your data.
Volcano Plots: Volcano plots are a powerful way to visualize the trade-off between magnitude of change (fold change) and statistical significance (FDR), highlighting features of potential interest.
UpSet Plots: These plots offer a comprehensive view of the overlap between significant features identified by different statistical tests.
Expression Profiles and Heatmaps: These visualizations allow for the detailed examination of the expression patterns of significant features, facilitating the identification of potential biological pathways affected by the conditions under study.
We now plot the pvalue distributions of limma and Miss Test for the first comparison.
# Define comparisons to visualize from available ones
compNames <- metadata(fulldata_unpaired)$compNames
print(compNames)
#> [1] "TTA.Rep._vs_HF.Rep." "FO.Rep._vs_HF.Rep." "TTA.FO.Rep._vs_HF.Rep."
comps <- compNames[1]
# Plotting the results
plotPvalueDistr(fulldata_unpaired, comps, c("limma", "Miss Test"))
#> Plotting p-values
#> Plotting p-values finished
Here’s an example of generating a volcano plot, marking proteins that are have significant changes in the first comparison with an FDR smaller than 1%
# Select proteins with FDR < 0.01
sigProts <- which(rowData(fulldata_unpaired)[, paste0("FDR_PolySTest_",
comps[1])] < 0.01)
# Volcano plot
plotVolcano(fulldata_unpaired, compNames = comps, sel_prots = sigProts,
testNames = c("PolySTest", "limma", "Miss Test"))
#> Plotting volcano plots
#> Plotting volcano plots finished
Now let’s look into the overlap between the different statistical tests. An upset plot provides this information
PolySTest also provides a combined plot of expression changes and overlap between different tests. Reduce the number of features to maximally 20 to avoid overloaded plots.
plotExpression(fulldata_unpaired, sel_prots = sigProts)
#> plotting expression profiles
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
#> zero-length arrow is of indeterminate angle and so skipped
#> Making circos plot
#> Note: 1 point is out of plotting region in sector '1', track '1'.
#> Note: 1 point is out of plotting region in sector '1', track '1'.
#> Note: 1 point is out of plotting region in sector '2', track '1'.
#> Note: 1 point is out of plotting region in sector '2', track '1'.
#> Note: 1 point is out of plotting region in sector '3', track '1'.
#> Note: 1 point is out of plotting region in sector '3', track '1'.
#> plotting expression profiles finished
A further comparison between tests shows how many features they provide for different cutoffs for the false discovery rate.
#> Plotting number of regulated features finished
For the last visualization, we create a hierarchical map (heatmap) for the differentially regulated proteins from the sigProts
object.
For more information on PolySTest, please refer to the PolySTest paper:
Schwämmle V, Hagensen CE, Rogowska-Wrzesinska A, Jensen ON. PolySTest: Robust Statistical Testing of Proteomics Data with Missing Values Improves Detection of Biologically Relevant Features. Mol Cell Proteomics. 2020;19(8):1396-1408. doi:10.1074/mcp.RA119.001777
For more tools and information, please visit https://computproteomics.bmb.sdu.dk
With the significant features identified and visualized, the next steps usually involve further biological interpretation and validation of the findings. Here:
If you have suggestions for improvements, encounter any issues, or would like to contribute code or documentation, please visit our GitHub repository:
GitHub: https://github.com/computproteomics/PolySTest
Your input helps make PolySTest a more robust and user-friendly tool for the omics community."
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] SummarizedExperiment_1.36.0 Biobase_2.66.0
#> [3] GenomicRanges_1.58.0 GenomeInfoDb_1.42.1
#> [5] IRanges_2.40.0 S4Vectors_0.44.0
#> [7] BiocGenerics_0.52.0 MatrixGenerics_1.18.0
#> [9] matrixStats_1.4.1 PolySTest_1.0.2
#> [11] BiocStyle_2.34.0
#>
#> loaded via a namespace (and not attached):
#> [1] bitops_1.0-9 gridExtra_2.3 fdrtool_1.2.18
#> [4] rlang_1.1.4 magrittr_2.0.3 compiler_4.4.2
#> [7] vctrs_0.6.5 reshape2_1.4.4 stringr_1.5.1
#> [10] pkgconfig_2.0.3 shape_1.4.6.1 crayon_1.5.3
#> [13] fastmap_1.2.0 XVector_0.46.0 labeling_0.4.3
#> [16] caTools_1.18.3 ca_0.71.1 utf8_1.2.4
#> [19] promises_1.3.2 rmarkdown_2.29 UCSC.utils_1.2.0
#> [22] UpSetR_1.4.0 purrr_1.0.2 xfun_0.49
#> [25] zlibbioc_1.52.0 cachem_1.1.0 jsonlite_1.8.9
#> [28] later_1.4.1 DelayedArray_0.32.0 parallel_4.4.2
#> [31] R6_2.5.1 bslib_0.8.0 stringi_1.8.4
#> [34] RColorBrewer_1.1-3 limma_3.62.1 jquerylib_0.1.4
#> [37] Rcpp_1.0.13-1 assertthat_0.2.1 iterators_1.0.14
#> [40] knitr_1.49 httpuv_1.6.15 Matrix_1.7-1
#> [43] splines_4.4.2 tidyselect_1.2.1 qvalue_2.38.0
#> [46] abind_1.4-8 yaml_2.3.10 viridis_0.6.5
#> [49] TSP_1.2-4 gplots_3.2.0 codetools_0.2-20
#> [52] lattice_0.22-6 tibble_3.2.1 plyr_1.8.9
#> [55] withr_3.0.2 shiny_1.9.1 evaluate_1.0.1
#> [58] heatmaply_1.5.0 circlize_0.4.16 pillar_1.9.0
#> [61] BiocManager_1.30.25 KernSmooth_2.23-24 foreach_1.5.2
#> [64] plotly_4.10.4 generics_0.1.3 ggplot2_3.5.1
#> [67] munsell_0.5.1 scales_1.3.0 xtable_1.8-4
#> [70] gtools_3.9.5 glue_1.8.0 lazyeval_0.2.2
#> [73] tools_4.4.2 dendextend_1.19.0 data.table_1.16.2
#> [76] webshot_0.5.5 registry_0.5-1 grid_4.4.2
#> [79] tidyr_1.3.1 crosstalk_1.2.1 seriation_1.5.6
#> [82] colorspace_2.1-1 GenomeInfoDbData_1.2.13 cli_3.6.3
#> [85] fansi_1.0.6 S4Arrays_1.6.0 viridisLite_0.4.2
#> [88] dplyr_1.1.4 gtable_0.3.6 sass_0.4.9
#> [91] digest_0.6.37 SparseArray_1.6.0 farver_2.1.2
#> [94] htmlwidgets_1.6.4 htmltools_0.5.8.1 lifecycle_1.0.4
#> [97] httr_1.4.7 mime_0.12 GlobalOptions_0.1.2
#> [100] statmod_1.5.0