Multi sample comparisons in R
Metabolites and lipids univariate statistical analysis in R
Multi-sample comparisons involve parametric ANOVA and non-parametric Kruskal-Wallis tests for comparing more than two samples simultaneously. If pairwise comparisons are necessary, then post hoc tests are performed.
Practical applications of multi sample comparisons (examples)
Multi sample comparisons are typically used to analyze lipid or metabolite levels across three or more unrelated experimental groups, among other applications. See the selected examples below:
1) ANOVA with post hoc tests:
V. de Laat et al. Intrinsic temperature increase drives lipid metabolism towards ferroptosis evasion and chemotherapy resistance in pancreatic cancer. DOI: https://doi.org/10.1038/s41467-024-52978-z - Fig. 1a & d (the one-way ANOVA with Å Ãdák’s multiple comparisons test was used several times in this manuscript (more examples), e.g., in Fig. 1h for comparing proliferation as measured by BrdU incorporation for PANC-1 or BxPC3 cells, for three different treatments).
WC. Wang et al. Metabolomics facilitates differential diagnosis in common inherited retinal degenerations by exploring their profiles of serum metabolites. DOI: https://doi.org/10.1038/s41467-024-47911-3 (ANOVA with Tukey’s honestly significant difference (HSD) post hoc test is used, e.g., for identification of 40 metabolites with the most significant differences among the seven IRD subgroups and control group - Fig. 2).
E. Rysman et al. De novo Lipogenesis Protects Cancer Cells from Free Radicals and Chemotherapeutics by Promoting Membrane Lipid Saturation. DOI: https://doi.org/10.1158/0008-5472.CAN-09-3871 (ANOVA with Tukey's multiple comparison test was used to compare cell counts in the case of four different treatments—see Fig. 4A & B).
2) Kruskal-Wallis with post hoc tests:
J. Wu et al. Lipidomic signatures align with inflammatory patterns and outcomes in critical illness. DOI: https://doi.org/10.1038/s41467-022-34420-4 (the authors use the Kruskal-Wallis test with Dunn's post hoc analysis, as seen in Fig. 1e, to compare circulating total lipid class concentrations between healthy controls and trauma patients; more examples in the manuscript).
R. Jirásko et al. Altered Plasma, Urine, and Tissue Profiles of Sulfatides and Sphingomyelins in Patients with Renal Cell Carcinoma. DOI: https://doi.org/10.3390/cancers14194622 (the comparison of plasma and urine sphingolipid levels among healthy controls, patients with early and advanced-stage kidney cancer via Kruskal-Wallis with Conover post hoc test).
W. Xiao et al. Lipid metabolism of plasma-derived small extracellular vesicles in COVID-19 convalescent patients. DOI: https://doi.org/10.1038/s41598-023-43189-5 (the authors use the Kruskal-Wallis test for comparing total lipids, total DG, and total PC levels across four experimental groups, no post hoc test applied - Fig. 2C - E; more examples in the manuscript).
ANOVA test in R
The ANOVA test for comparing more than two unpaired samples is computed through the anova_test() function from the rstatix package. The function, except for the classic independent measures ANOVA, enables also computing:
repeated measures ANOVA,
mixed ANOVA or split-plot ANOVA,
ANCOVA test.
However, we will not discuss these additional tests here. We encourage you to consult the documentation of anova_test() for further information:
A similar data inspection as for the t-test can be performed before ANOVA because the null hypothesis of ANOVA concerns differences in means. A Kruskal-Wallis test can be performed if sample distributions are skewed, and means are not representative values. Look at Glimpse at data before the hypothesis testing in the Two sample comparisons in R subchapter for more information about the data inspection.
The ANOVA test will again require the formula we used before:
numeric variable ~ grouping variable
For a single lipid, we use the following code:
The output in the console:
The detailed test results contain sums of squares (SSn, SSd), the degrees of freedom (DFn, DFd), the F-value, the p-value which is significantly below the threshold (0.05), and the squared eta effect size of 0.36, which is considered large. The asterisk symbols are used to mark the p-value < 0.05 threshold (the alternative hypothesis is accepted).
For all numeric variables, we need to obtain a long tibble first, and then group the variables by lipid species. We store the obtained tidy tibble with results as "ANOVA.test.res":
The output:

Kruskal-Wallis test in R
The Kruskal-Wallis test relies on similar assumptions as the Mann-Whitney U test. It extends in fact the Mann-Whitney U test for comparing three or more groups. It is a non-parametric test comparing mean ranks. The Kruskal-Wallis test is performed using the kruskal_test() function from the rstatix package. The function needs the data frame with our concentrations, and the formula:
numeric variable ~ grouping variable
Here are examples of code:
The output in the R console:
If we want to apply the Kruskal-Wallis test to all numeric variables, we need to use a long tibble. After obtaining the long tibble (via pivot_longer()), we group entries by lipid species - column Lipids. The new column Concentrations storing all numeric values is then used in the formula with the Label column for computing the Kruskal-Wallis test. Look at the example below:
The output:

Post hoc tests in R
Using rstatix library, we can perform basic pairwise tests, including:
Tukey Honestly Significant Difference (Tukey HSD) (normally distributed data with equal variances),
Games-Howell test (if groups are characterized by different variances - homogeneity of variances is violated),
Dunn test (nonparametric post hoc test).
The Tukey HSD and Games-Howell test are computed following ANOVA (parametric analysis), while the Dunn test is performed following the Kruskal-Wallis test (nonparametric analysis).
Through pairwise comparisons, we find the differing pairs of groups. Below, you will find examples of code:
We obtain:
The output:
The output:
All three tests can be computed for multiple variables through long tibbles:
We obtain the following tibble:

The code below allows for computing the Games-Howell test for multiple variables simultaneously:
The results:

Similarly for the Dunn post hoc test, we use:
The results:

Pairwise comparisons with PMCMRplus in R
If you are looking for different types of tests for pairwise comparisons, the PMCMRplus R package is the solution that you need. Below, you will find the vignette of this package:
Here, we will show you one simple example of how the PMCMRplus package can be used. Take a look at the code below:
We obtain in the console:
Now, for the model object, we could compute, for instance, the Fisher's Least Significant Difference Test (Fisher's LSD), assuming the normal distribution of all samples and that variances are similar across groups:
We obtain in the console:
Based on ANOVA, we find that the concentration of SM 41:1;O2 is significantly different in at least one group of the three groups (N, PAN, T) from the overall mean. Hence, we accept the alternative hypothesis of ANOVA. Using Fisher's LSD test, we detect that significant differences occur between controls (N) and patients with PDAC (T), but also between patients with pancreatitis (PAN) and patients with PDAC (T). No differences in SM 41:1;O2 levels were found between the healthy controls (N) and patients with pancreatitis (PAN).
After analyzing all samples' distribution, we already know that the mean for many of the lipid species was not the most representative value of biological groups. Distributions of samples were skewed. Also, for most lipids, we found differences in variances between the compared groups. The non-parametric Kruskal-Wallis test could be applied in this case:
We obtain in the console:
Next, we can use a nonparametric Conover test for pairwise comparisons:
And we obtain:
The results of this test are similar to the ANOVA - based on the Kruskal Wallis test, we know that at least one sample comes from a different population than others (simplifying the alternative hypothesis). From the Conover all-pairs test, we know that differences occur between healthy controls (N) and patients with PDAC (T), and controls (N) and patients with pancreatitis (PAN). Also, no differences between controls (N) and patients with pancreatitis (PAN) were found.
PMCMRplus comparisons for multiple variables (advanced)
Here, we will show you how to loop the PMCMRplus functions on numeric columns of your tibble and obtain a tibble with the outcomes of a selected post hoc test. This material is advanced, and it may be complicated for beginners. If you focus only on basic analysis and visualization of lipidomics and metabolomics data, you skip this subchapter.
As you know from above, PMCMRplus is a complete solution for pairwise testing in R. The PMCMRplus package also contains a function toTidy(), which builds a tidy data frame from the output of many PMCMRplus functions. Using for() loop in R, we can perform the post hoc test for every numeric column and, using toTidy(), store the results from tests in the list. See the code below:
Our post hoc test results are now stored as a list of lists named 'results.LSD'. We will change it into a tibble with all post hoc results:
And we obtain this tibble with all posthoc results:

We need to add lipid species to this tibble:
The final output:

We proceed similarly for Games-Howell and Tukey HSD:
The output:

The output:

We will also proceed similarly for non-parametric post hoc tests used after the Kruksal-Wallis test, i.e., Dunn, Conover, and Nemenyi posthoc tests:
The output:

The output:

The output:

NOTE! If missing values (NA, NaN) are present in the data frame, the functions from rstatix library will drop the rows containing missing values while performing multi sample hypothesis testing.
Last updated