Omics data visualization in R and Python

Adjustments of p-values for multiple comparisons

Metabolites and lipids univariate statistical analysis in R

PreviousMulti sample comparisons in R NextEffect size computation and interpretation

Last updated 3 months ago

Adjustments of p-values for multiple comparisons

Metabolites and lipids univariate statistical analysis in R

In manuscripts presenting the statistical analysis of lipidomics or metabolomics data, you will often find the expression "p-value adjusted for multiple comparisons". Examples of such manuscripts are presented below:

M. Osetrova et al. Lipidome atlas of the adult human brain. DOI:
J. Idkowiak et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. DOI:
A. Kvasnička et al. Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment. DOI:
D. Hornburg et al. Dynamic lipidome alterations associated with human health, disease and ageing. DOI:
R. Jirásko et al. Altered Plasma, Urine, and Tissue Profiles of Sulfatides and Sphingomyelins in Patients with Renal Cell Carcinoma. DOI:
D. Wolrab et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. DOI:

Here, we will briefly explain to you what it means.

While looking for differences between two biological groups, we usually perform hypothesis testing on many features (lipids, metabolites). For our exemplary data set used throughout the Gitbook - concentrations were measured for 127 lipids. Hence, we would compute 127 statistical tests. Repeating, for example, a t-test analysis increases the chance of obtaining false positive results. Namely, the test results would suggest differences in means between two groups with p-value < 0.05, while in reality, the null hypothesis about no differences should be retained. This situation is also referred to as a 'type I error'. Frequently, the results of such tests are at the edge of acceptance, e.g., the p-value ranges between 0.03 and 0.05 - the threshold for acceptance/rejection of the null hypothesis. This is why we "adjust p-values for multiple comparisons". To prevent reporting results, which potentially are false positives. In the effect of the adjustment, a mathematical correction is performed which leads to an increase of p-values, causing the results at the edge of acceptance (p-values close to 0.05) to be finally rejected. The corrected p-values are above the assumed acceptance threshold and the null hypothesis is retained. A number of methods for adjusting p-values exist. Here, we will show you examples of two of them - the Bonferroni correction, considered a very conservative method and a very popular - the Benjamini-Hochberg (BH) method also referred to as 'false discovery rate', or 'fdr'.

At this step, we would strongly encourage you to watch a YouTube video prepared by Josh Starmer, who created StatQuest - a YouTube channel aiming to explain frequently used statistical methods and tests. StatQuest originates from the University of North Carolina at Chapel Hill. In the video, Josh explains in simple words the concept of the false discovery rate and how the Benjamini-Hochberg works. You will also realize that the actual math behind the BH method is simple.

Several methods exist in R for implementing p-value corrections. We will correct p-values through the adjust_pvalue() function from the rstatix package, which can be applied right after the t_test() in the pipeline. First, we will select the method of the Bonferroni correction. Take a look at this code:

# Adjusting p-values for multiple comparisons.
# Calling library:
library(rstatix)
library(tidyverse)

# Reading the documentation:
?adjust_pvalue()

# Example - t-test:
t.test.bonferroni.signif <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label,
         ref.group = "N",
         p.adjust.method = 'none') %>%
  adjust_pvalue(method = "bonferroni") %>%
  add_significance(p.col = "p.adj") %>%
  filter(p.adj < 0.05) %>%
  arrange(p.adj)

Here, we select from the wide 'data' tibble all numeric variables and the factor variable stored in the Label column. We create a long tibble consisting of columns Label, Lipids, and Concentrations. Next, we group values by lipid species, and we perform t-tests on all numeric variables, comparing mean plasma concentrations of lipids in patients with pancreatitis (PAN) or pancreatic cancer (PDAC) to the reference group - healthy controls (N). The argument p.adjust.method is set to 'none', meaning we will adjust the p-values in the next step through adjust_pvalue(). In the adjust_pvalue(), the argument method is set to 'bonferroni'. We additionally add to the final tibble a column with asterisk annotations representing the adjusted p-values (add_significance()), filter the results to keep only those that remain significant after the correction, and arrange them by the 'p.adj' column entries. This way, we retain 41 significant outcomes.

The math behind this correction is also very simple:

p.adj= p*n

The corrected p-value (p.adj) is simply a raw p-value (p) from the t-test multiplied by the total number of comparisons - t-tests performed (n). 127 lipids are compared for N vs T and N vs PAN = 127x2 = 254 comparisons. Now, you can realize how strict the Bonferroni correction is.

We can quickly check how many of the raw p-values, which were below the threshold of 0.05 become insignificant after correction using this code:

# Checking the Bonferroni correction influence:
t.test.bonferroni.no.signif <- 
  data %>%
  select(`Label`,
         where(is.numeric)) %>%
  pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label,
         ref.group = "N",
         p.adjust.method = "none") %>%
  adjust_pvalue(method = "bonferroni") %>%
  filter(p < 0.05 & p.adj >= 0.05) %>%
  arrange(p.adj)

This way, we find that 87 lipids were filtered out using the Bonferroni correction, with p-values ranging from 0.000251 to 0.049. Finally, we can check the number of insignificant outcomes:

# Checking the Bonferroni correction influence:
t.test.no.signif <- 
  data %>%
  select(`Label`,
         where(is.numeric)) %>%
  pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label,
         ref.group = "N",
         p.adjust.method = "none") %>%
  filter(p >= 0.05) %>%
  arrange(p.adj)

We obtained 126 insignificant outcomes, together making 254 tests and only 32% of all significant results were retained after the Bonferroni correction.

Now, let's perform similar computations using the Benjamini-Hochberg method:

# Using the Benjamini-Hochberg approach for adjusting p-values:
t.test.BH.signif <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label,
         ref.group = "N",
         p.adjust.method = 'none') %>%
  adjust_pvalue(method = "BH") %>%
  filter(p.adj < 0.05) %>%
  arrange(p.adj)

# Checking the Benjamini-Hochberg approach influence:
t.test.BH.no.signif <- 
  data %>%
  select(`Label`,
         where(is.numeric)) %>%
  pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label,
         ref.group = "N",
         p.adjust.method = "none") %>%
  adjust_pvalue(method = "BH") %>%
  filter(p < 0.05 & p.adj >= 0.05) %>%
  arrange(p.adj)

We obtained 97 significant differences after the BH correction and 31 outcomes that were filtered using the BH correction, with 126 insignificant outcomes giving 254 comparisons. In this case, nearly 76% of results were retained. The BH correction filtered out positive t-test results with p-values in the range of 0.02 to 0.049.

The Bonferroni correction retained only the differences with the lowest p-values when compared to the BH correction. However, there is a chance that such a conservative correction will filter out a significant number of significant differences, potentially leading to a dramatic increase in the number of type II errors - failure to detect actual differences.

Hence, we will continue applying the classic Benjamini-Hochberg correction to remove potential false positives in the next parts of this Gitbook.