Two sample comparisons in R

Metabolites and lipids univariate statistical analysis in R

Practical applications of two sample comparisons (examples)

Two-sample comparisons, such as the parametric t-test and its variants, along with the non-parametric Mann-Whitney U test, are widely used to identify discriminating features in lipidomics and metabolomics datasets. These tests are fundamental, as nearly every -omics data analysis software includes options for hypothesis testing.

Check out the selected examples of two sample comparisons in lipidomics and metabolomics.

1) Examples of t-test applications:

D. Wolrab et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. DOI: https://doi.org/10.1038/s41467-021-27765-9 (the authors employ Welch's t-test variant to identify altered lipids in pancreatic cancer patients compared to healthy individuals).
E. Rysman et al. De novo Lipogenesis Protects Cancer Cells from Free Radicals and Chemotherapeutics by Promoting Membrane Lipid Saturation. DOI: https://doi.org/10.1158/0008-5472.CAN-09-3871 (the classic Student's t-test application for comparing two experimental groups in multiple examples throughout the manuscript).
V. de Laat et al. Intrinsic temperature increase drives lipid metabolism towards ferroptosis evasion and chemotherapy resistance in pancreatic cancer. DOI: https://doi.org/10.1038/s41467-024-52978-z - Fig. 1a & d (an excellent example of paired t-test application is the measurement of PDAC tumor and adjacent healthy tissue temperatures in 11 patients; since temperature is measured twice in the same subject, the paired t-test is used; additionally, lipid profiling is conducted on both tissues, and the paired t-test is applied to compare PUFA-containing ether lipid species levels).
M. Lange et al. AdipoAtlas: A reference lipidome for human white adipose tissue. DOI: https://doi.org/10.1016/j.xcrm.2021.100407 (the application of classic Student's t-test for identifying differentially regulated lipids).
Y. R. J. Jaspers et al. Lipidomic biomarkers in plasma correlate with disease severity in adrenoleukodystrophy. DOI: https://doi.org/10.1038/s43856-024-00605-9 (based on the distribution of lipid concentrations, the authors use either Welch's t-test or the Mann-Whitney U test to compare experimental groups).

1) Examples of the Mann-Whitney U test:

A. Kvasnička et al. Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment. DOI: https://doi.org/10.1186/s13075-023-03204-6 (the application of the Mann-Whitney U test for comparing lipid levels between two experimental groups).
R. Jirásko et al. Altered Plasma, Urine, and Tissue Profiles of Sulfatides and Sphingomyelins in Patients with Renal Cell Carcinoma. DOI: https://doi.org/10.3390/cancers14194622 (the application of the Mann-Whitney U test for comparing sphingolipid levels between kidney cancer patients and healthy controls in plasma and urine; further, the application of the Wilcoxon signed-rank test for comparing sphingolipid levels in kidney cancer tumors vs healthy adjacent tissue).
Y. R. J. Jaspers et al. Lipidomic biomarkers in plasma correlate with disease severity in adrenoleukodystrophy. DOI: https://doi.org/10.1038/s43856-024-00605-9 (based on the distribution of lipid concentrations, the authors use either Welch's t-test or the Mann-Whitney U test to compare experimental groups)
M. Osetrova et al. Lipidome atlas of the adult human brain. DOI: https://doi.org/10.1038/s41467-024-48734-y (authors use the Mann-Whitney U test for estimating the significance level of correlation coefficients).
J. Idkowiak et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. DOI: https://doi.org/10.1007/s00216-022-04490-w (the application of the Mann-Whitney U test for comparing lipid levels between patients with cancer vs healthy controls).

Two sample comparisons via rstatix library

Among many solutions, the rstatix library provides one of the most user-friendly pipelines for hypothesis testing for beginners. The rstatix package can be used with all tools from the tidyverse collection. All basic statistical tests for two sample comparisons, including t-test, Welch's t-test, and Mann-Whitney U test can be applied to the long tibbles, and the testing is performed automatically for all variables in the data set. The outputs are data frames with the results of statistical tests.

First, we need to call the library. If rstatix was not installed earlier, do it now using install.packages():

# If you have not installed the rstatix yet:
install.packages("rstatix")

# Calling the library:
library(rstatix)

Below, you will find information on how to perform all three statistical tests in R:

classic t-test
Welch's t-test
Mann-Whitney U test

Also, the tests for paired samples:

paired t-test,
Wilcoxon rank sum test.

Glimpse at data before the hypothesis testing

The assumptions of the t-test are frequently recycled before it is performed, particularly about the normality and similarity of variances. It is important to remember that normality refers to the entire population, not the sample data. The sample here is understood as our collected set of patients representing a population. Normal populations can be characterized by mean and standard deviation. This way, if we use the t-test for comparisons, we assume that the mean value of our sample is representative. This may be difficult to state for skewed distributions with a small number of samples analyzed. We can take a look at the distribution of our samples to check if the mean value is a good centrality measure. The normality assumption can be evaluated using statistical tests (e.g., Shapiro-Wilk), or graphically - by plotting histograms, density plots, or Q-Q plots. The graphical investigation can give you a much better idea about the distribution of your samples.

Let's start with the Shapiro-Wilk test:

# Shapiro-Wilk test in R (rstatix):
# Creating a long tibble:
data.long <- 
  data %>%
  pivot_longer(cols = `CE 16:1` : `SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations")
               
# Performing the Shapiro-Wilk test:               
distribution <- 
  data.long %>%
  group_by(Lipids, Label) %>%
  shapiro_test(Concentrations) %>%
  mutate(Result = if_else(p < 0.05, "non-normal", "normal"))
  
# Print results in console:
print(distribution)

The output:

The Shapiro-Wilk test is a test of normality. Its null hypothesis is that the sample comes from a normal distribution. However, if the p-value is below our selected threshold (0.05), the alternative hypothesis should be accepted.

Next, density plots can be used to analyze the distribution of our samples and how representative the mean is. Here, we will show you an example of SM species. You can check the rest of the lipids/metabolites, e.g. class by class to speed up the visualization:

# Density plots to visualize the distribution of samples:
# Computing mean values for all SM species for all biological groups:
means <-
  data %>%
  select(`Label`,
         starts_with("SM")) %>%
  pivot_longer(cols = `SM 32:1;O2`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids, Label) %>%
  summarise(mean = mean(Concentrations))
  
# Creating density plots with mean values as vlines:
data %>%
  select(`Label`,
         starts_with("SM")) %>%
  pivot_longer(cols = `SM 32:1;O2`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  ggplot(aes(x = Concentrations, color = Label)) +
  geom_density() +
  geom_vline(data = means, 
             aes(xintercept = mean, color = Label),
             linewidth = 1) +
  scale_color_manual(values = c("royalblue", "orange", "red2")) +
  facet_wrap(~ Lipids, scales = "free") +
  theme_bw()

The output:

Based on the Shapiro-Wilk test results and the density plot visualizations, we realize that the distributions of our samples for SM species are in most cases right-skewed and the mean values are not the best centrality measures. These are not the most optimal conditions for using the t-test. In such a case, one could consider using the non-parametric Mann-Whitney U test. However, we need to mention the t-test is robust for comparing skewed samples for bigger sample sizes, e.g., in our case with N = 97, T = 109, and PAN = 21, we could still apply the t-test. Additionally, we check if the variances are similar, using, e.g. Bartlett's test:

# Bartlett's test to test the similarity of variances:
similarity.variances <- 
  data %>% 
  select(-`Sample Name`,
         -`Label`) %>%
sapply(function(x) bartlett.test(x, g = data$Label))

Bartlett's test null hypothesis is that population variances are equal. After performing Bartlett's test, we obtained a p-value below the 0.05 threshold in multiple cases, meaning that we reject the null hypothesis. Unequal variances between biological groups occur frequently in lipidomics and metabolomics data. Therefore, a better solution than the classic t-test could be Welch's t-test.

Hence, for the two sample comparisons we could apply:

1) Welch's t-test: considering the robustness of the t-test to certain violations of normality and substantial sample sizes; also - for many variables, unequal variances between groups;

2) Mann-Whitney U test: considering the right-skewed distributions of samples and that sample means are not the best representative values.

3) We could also perform both tests to compare their outcomes, namely - find if both tests indicate similar differences in lipid levels in plasma samples between healthy controls and patients with PDAC, or healthy controls and PAN patients.

4) Alternatively, one could apply the log-transformation, and check if the mean of logs is more representative, as log-transformation could correct the right skewness to a certain extent.

You will find more about selecting a statistical test for hypothesis testing here:

t-test in R

The classic t-test is computed using the function t_test(). Read carefully the documentation regarding this function first:

# Consulting the documentation:
?t_test()

Here, we will focus on the most classic application of the t-test in the case of lipidomics or metabolomics data - comparing outcomes between two biological groups.

The null hypothesis of the t-test in this case states no difference between the means of the two groups being compared.

Next, we need to prepare our data. We will remove the group of patients with pancreatitis (PAN) to present the t-test at first, and we will create the long tibble within one pipeline:

# Preparing data for the t-test:
data.N.T <- 
  data %>%
  filter(Label != "PAN") %>%
  pivot_longer(cols = `CE 16:1` : `SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations")

To perform a t-test, we need to group the entries in the long tibble by the Lipids column. The t_test() function, according to the documentation, requires the following formula:

numeric variable ~ grouping variable

In our case, the numeric variable is the columns Concentrations while the grouping variable is stored in the Label column.

In the function t_test(), we will additionally set var.equal to TRUE (the classic t-test assumes that variances of two groups being compared are equal). Finally, p.adjust.method should be 'none' - we will explain this part later.

So, the final code for the t-test:

# Hypothesis testing in R with rstatix. 
# t-test
t.test.res <- 
  data.N.T %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label, 
         var.equal = T, 
         p.adjust.method = 'none')
         
# Print results:
print(t.test.res)

We obtain this tibble with results for all variables:

Among other arguments of the t_test() function we can mention:

alternative - here, we select if a two-sided ('two.sided' by default) or one-sided test ('greater' or 'less') should be performed. For metabolomics and lipidomics data - if you do not expect any particular direction of alterations, keep the default setting, i.e., the two-sided test. In many cases, scientists are interested in any potential difference in mean concentrations between two groups, no matter whether it is greater than 0 (mean concentration group 1 > mean concentration group 2) or less than 0 (mean concentration group 1 < mean concentration group 2).
mu - additional parameter used for forming a null hypothesis (by default, is equal to 0). With a default 0, we can formulate the null hypothesis: The true difference in means between group N and group T equals 0. We can change the mu, for example, if we want to test if an average concentration or a parameter differs from a specific value. Assume we measure the concentration of creatinine in plasma (in mg/dL) in a selected group of patients. We can use it to test if the average concentration in a selected group is different than 1.35 mg/dL (mu = 1.35). If we select the two-sided test, the null hypothesis (no difference) can be rejected for the average concentration higher or lower than 1.35 mg/dL. If the one-sided test is used, then we need to specify if we want to test the average concentration being greater than 1.35 mg/dL or less than 1.35 mg/dL.
comparisons - a list of vectors containing two entries, which specify the comparisons to be performed.
paired - for paired experimental setups (paired samples) should be changed to TRUE (T).
detailed - can be changed to TRUE; detailed t-test statistics.
conf.level - confidence level of the interval.
ref.group - here we can specify a reference group against which all comparisons should be performed. For instance, if we want to compare N to T and N to PAN, we could set ref.group = "N":

# Using ref.group to compare both groups of patients (T, PAN) to controls (N):
# Creating a long tibble:
data.long <- 
  data %>%
  pivot_longer(cols = `CE 16:1` : `SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations")

# Performing a t-test:
t.test.res <- 
  data.long %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label, 
         var.equal = T, 
         p.adjust.method = 'none',
         ref.group = "N")

print(t.test.res)

We obtain:

The tibble that is produced by the t_test() function can be filtered to keep only results with p-value < 0.05 (reject the null hypothesis), or we can arrange them from highest to lowest values. See the examples below:

# Keeping only significant t-test results:
t.test.res <- 
  data.long %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label, 
         var.equal = T, 
         p.adjust.method = 'none',
         ref.group = "N") %>%
  filter(p < 0.05)

# Arranging results from lowest to highest p-value:
t.test.res <- 
  data.long %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label, 
         var.equal = T, 
         p.adjust.method = 'none',
         ref.group = "N") %>%
  arrange(p)
  
# Print results:
print(t.test.res)

# Arranging results by descending order of p-values:
t.test.res <- 
  data.long %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label, 
         var.equal = T, 
         p.adjust.method = 'none',
         ref.group = "N") %>%
  arrange(desc(p))
  
# Printing t-test results:
print(t.test.res)

The t_test() can be also used to test one selected variable:

# Testing one variable only:
t_test(data, `SM 41:1;O2` ~ Label, ref.group = "N", p.adjust.method = 'none')

The output (in console):

> t_test(data, `SM 41:1;O2` ~ Label, ref.group = "N", p.adjust.method = 'none')
# A tibble: 2 × 10
  .y.        group1 group2    n1    n2 statistic    df        p    p.adj p.adj.signif
* <chr>      <chr>  <chr>  <int> <int>     <dbl> <dbl>    <dbl>    <dbl> <chr>       
1 SM 41:1;O2 N      PAN       97    21    -0.372  26.9 7.13e- 1 7.13e- 1 ns          
2 SM 41:1;O2 N      T         97   109    10.7   191.  3.01e-21 3.01e-21 ****

Welch's t-test in R

To perform a Welch's t-test, we only need to set the var.equal argument to FALSE (F). Welch's t-test assumes that variances differ between the two groups being compared. Here is the code:

# Computing Welch's t-test in R:
Welch.t.test.res <- 
  data.long %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label, 
         var.equal = F,           # Remember to change var.equal to FALSE (F).
         p.adjust.method = 'none',
         ref.group = "N") %>%
  filter(p < 0.05)

print(Welch.t.test.res)

And we obtain:

Mann-Whitney U test in R

The Mann-Whitney U test can be used for comparing two samples from skewed distributions as its power may be superior to the t-test in such cases. In the rstatix library, the Mann-Whitney U test is performed via the wilcox_test() function, which is built and used similarly to t_test():

# Mann-Whitney test in R using rstatix library.
# Reading the documentation:
?wilcox_test()

# Performing Mann-Whitney test:
Mann.Whitney.res <- 
  data.long %>%
  group_by(Lipids) %>%
  wilcox_test(Concentrations ~ Label, 
         p.adjust.method = 'none',
         ref.group = "N") 
         
# Printing the results in the console:
print(Mann.Whitney.res)

We obtain:

The wilcox_test() function requires the data, the formula (similar as for the t_test()):

numeric variable ~ grouping variable

The arguments: comparisons, ref.group, paired, alternative, mu, and detailed are used in the same way as for the t_test(). The new argument exact, which can be set to TRUE (by default NULL), enables computing the exact p-value instead of the approximated. For substantial data sets, we do not advise setting exact to TRUE as it may be a computationally intensive task.

Paired t-test in R

The paired t-test has to satisfy the following assumptions:

subjects involved in the study should be independent,
each pair of measurements in condition 1 and condition 2 are obtained from the same subject, e.g. two tissue specimens collected from one patient - a piece of a tumor vs a piece of healthy tissue, or blood samples collected at two different time points - e.g., before and after symptoms appear, etc.
the differences between the pairs of values (condition 1 vs condition 2) should be normally distributed.

Here, we will again use the data set published by R. Jirásko et al. in Cancers. We load data in R as 'data.paired' and change the column Tissue part into a factor. The study design fulfills the first two conditions. Below, we test the third condition:

# Loading data in R:
data.paired <- readxl::read_xlsx(file.choose())

# Changing `Tissue part` into a factor
data.paired$`Tissue part` <- as.factor(data.paired$`Tissue part`)

# Separting all numeric results for healthy, adjacent tissues (N):
data.paired.N <-
  data.paired %>%
  filter(`Tissue part` == "N") %>%
  select(where(is.numeric), -Age) %>%
  as.data.frame()

# Separting all numeric results for tumor tissues (T):
data.paired.T <-
  data.paired %>%
  filter(`Tissue part` == "T") %>%
  select(where(is.numeric), - Age)

# Computing difference:
difference <- as_tibble(data.paired.N - data.paired.T)

# Selecting differences for SM species:
difference.SM <-
  difference %>%
  select(starts_with("SM"))
  
# Changing wide 'difference.SM' tibble into a long one:
diff.long <- 
  difference.SM %>%
  pivot_longer(cols = 1:17,
               names_to = "Lipids",
               values_to = "Differences")
               
# Creating density plots to visualize the distributions:
ggplot(diff.long, aes(x = Differences)) +
  geom_density(fill = 'red2') +
  theme_bw() +
  facet_wrap(~ Lipids, scales = 'free')

We obtain the following plot:

Looking at the density plots above, we would conclude that this condition is not satisfied in most cases as the distributions of the differences between pairs are skewed. We could continue with a non-parametric Wilcoxon rank sum test. However, we will first show you how to compute a paired t-test in R. We again use t_test() function from the rstatix package, and this time we set the paired argument to TRUE (T):

# Paired t-test in R:
# Creating a long matrix:
data.long <-
  data %>%
  select(where(is.numeric),`Sample code`, `Tissue part`, -Age) %>%
  pivot_longer(cols = `SHex2Cer 32:1;O2`:`SHexCer 44:3;O2`,
               names_to = "Lipids",
               values_to = "Concentrations")
               
# Rename grouping variable to 'Label':
names(data.long)[2] <- "Label"

# Paired t-test:
t.test.paired.res <- 
  data.long %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label,
         paired = T,
         p.adjust.method = 'none')

The output:

Wilcoxon rank sum test in R

The Wilcoxon rank sum test is a non-parametric test used for comparing paired samples. It can be computed using the wilcox_test() function. The argument paired must be set to TRUE:

# Wilcoxon rank sum test in R:
# Creating a long matrix:
data.long <-
  data %>%
  select(where(is.numeric),`Sample code`, `Tissue part`, -Age) %>%
  pivot_longer(cols = `SHex2Cer 32:1;O2`:`SHexCer 44:3;O2`,
               names_to = "Lipids",
               values_to = "Concentrations")
               
# Rename grouping variable to 'Label':
names(data.long)[2] <- "Label"

# Computing Wilcoxon rank sum test:
Wilcox.test.res <- 
  data.long %>%
  group_by(Lipids) %>%
  wilcox_test(Concentrations ~ Label,
         paired = T,
         p.adjust.method = 'none')
       
print(Wilcox.test.res)

The output:

NOTE! If missing values (NA, NaN) are present in the data frame, the functions from rstatix library will drop the rows containing missing values while performing two sample hypothesis testing.

PreviousCorrelation analysis NextMulti sample comparisons in R

Last updated 3 months ago