Results of tests as annotations in the charts
Metabolites and lipids univariate statistical analysis in R
Adding annotations for statistical significance above bar charts, box plots, and dot plots has become standard practice in lipidomics and metabolomics articles. Check out the following manuscripts:
R. Jirásko et al. Altered Plasma, Urine, and Tissue Profiles of Sulfatides and Sphingomyelins in Patients with Renal Cell Carcinoma. DOI: https://doi.org/10.3390/cancers14194622 - Fig. 2C & D, Fig. 3, Fig. 4B - D.
E. Rysman et al. De novo Lipogenesis Protects Cancer Cells from Free Radicals and Chemotherapeutics by Promoting Membrane Lipid Saturation. DOI: https://doi.org/10.1158/0008-5472.CAN-09-3871 - Fig. 1A & B, Fig. 3A, Fig. 4A & B, Fig. 6A, C - F.
V. de Laat et al. Intrinsic temperature increase drives lipid metabolism towards ferroptosis evasion and chemotherapy resistance in pancreatic cancer. DOI: https://doi.org/10.1038/s41467-024-52978-z - Fig. 1a, b, d, e, h, i, j, and more.
F. Torta et al. Concordant inter-laboratory derived concentrations of ceramides in human plasma reference materials via authentic standards. DOI: https://doi.org/10.1038/s41467-024-52087-x - Fig. 3.
J. Wu et al. Lipidomic signatures align with inflammatory patterns and outcomes in critical illness. DOI: https://doi.org/10.1038/s41467-022-34420-4 - Fig. 1e, Fig. 2f, Fig. 5b & e.
O. Peterka & A. Maccelli et al. HILIC/MS quantitation of low-abundant phospholipids and sphingolipids in human plasma and serum: Dysregulation in pancreatic cancer. DOI: https://doi.org/10.1016/j.aca.2023.342144 - Fig. 3E.
H. Tsugawa et al. A lipidome landscape of aging in mice. DOI: https://doi.org/10.1038/s43587-024-00610-6 - e.g., Fig. 8.
Annotations corresponding to statistical significance can be added to ggpubr and ggplot2 charts through the stat_pvalue_manual() function from the ggpubr library. The application of this function is straightforward if hypothesis testing is performed using functions from rstatix library. These functions create tibbles with defined column names, which are expected by the stat_pvalue_manual() - look at the 'data' argument in the documentation of this function. Additionally, we need to define the type of label - whether it should be a p-value, asterisks corresponding to p-value ranges, adjusted p-values, or others. Take a look at the code blocks below to understand the stat_pvalue_manual() function.
First, we will begin with t-test annotations. We create a long tibble and compute t-tests for all features.
# Adding annotations from hypothesis testing to ggpubr / ggplot2 charts.
# First, create a long tibble:
data.long <-
data %>%
select(-`Sample Name`) %>%
filter(Label != "PAN") %>%
pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
names_to = "Lipids",
values_to = "Concentrations") %>%
droplevels() # We ensure the "PAN" label is removed from grouping variable
# Next, compute a t-test between N and T, adjust p-values, and add asterisks for p.adj:
t.test <-
data.long %>%
group_by(Lipids) %>%
t_test(Concentrations ~ Label) %>%
adjust_pvalue(method = "BH") %>%
add_significance(p.col = "p.adj") # The asterisks will correspond to the adjusted p-value ranges!
print(t.test)
# This is an example of tibble that stat_pvalue_manual() needs - take a look at it.We obtain the following tibble:

As our tibble with results is ready, we need to perform one more operation. To add annotations, we need to define their x and y positions in the plot. The ggpubr provides a function add_xy_position() that can do this for you.
Let's begin with a simple plot for one selected lipid, e.g., SM 41:1;O2. We filter the 't.test' tibble by rows to keep t-test results only for SM 41:1;O2. Next, we add x and y positions. In the add_xy_position(), we specify scales = "free" to make sure positions are exclusive for every annotation.
We obtain in the R console:
Now, let's create our first chart with annotations - let's begin with a raw p-value. The updated 't.test' tibble contains now only information for one lipid - SM 41:1;O2. In the stat_pvalue_manual(), we make sure the ggplot2 aesthetics are not inherited (inherit.aes set to FALSE), and that the label will be raw p-value - label is taken from the 'p' column of 't.test' tibble (label = "p").
Let's create box plots for this lipid with annotations:
In effect, we obtain these box plots:

We can of course easily change the label type to the adjusted p-value or asterisks denoting statistical significance:
We obtain:

Similar plots can be easily constructed through the ggpubr library. In this case, you do not need to specify inherit.aes = F:
We obtain:

Now, let's move to multi-panel plots created, e.g., via facet_grid(). Here, it is crucial that you specify scales = "free" in the add_xy_position(). Otherwise, the add_xy_position() function will add non-matching y-positions. Take a look at the example below:
We obtain:

We would proceed similarly if we would like to place annotations above box plots plotted in one x-y plane (using facet_grid()). See the code below:
The output:

Using long tibble with data, we can recreate these plots through the ggpubr, too:
The output:

Suppose you want to change the annotation labels' height above box plots. The most efficient way is to check the current y.position in your tibble with statistical test results and modify it. Take a look at this example:
We obtain in the R console:
Comparing the current position of the labels with these values, we see that the labels could be slightly lifted. We can use the mutate() function from the dplyr package to change these values:
We obtain in the R console:
And the updated plot:
The plot:

Now, let's use the omnibus test results to create annotations - e.g., from the ANOVA. In the first step, we compute the test for all variables, adjust p-values for multiple comparisons, and add significance symbols:
According to the documentation, anova_test() returns an object of class anova_test - a data frame containing the ANOVA table for the basic type of ANOVA test. We change it into a tibble to simplify further steps:
We round all p-values in the new 'ANOVA' tibble using p_round():
Now, if you check the output tibble from the last line of code - it still does not contain the following columns: group1, group2, xmin, xmax, and y.position. These columns are necessary to create an annotation by stat_pvalue_manual() and are created by add_xy_position(). Using mutate(), we can easily add all of them. If you want to use add_xy_position(), first, you must add group1 and group2 via mutate().
The group1 should contain N and group2 - T. The xmin and xmax define the annotation position on the x-axis. The y.position is related to the label height above box plots, bar charts, dot plots, etc. The y.position can be estimated based on the maximum concentration measured for each lipid, e.g., the label can be placed at a 20% higher value than the highest value of concentration measured. It is all achieved through the following code lines:
Now, having all the necessary columns in the ANOVA tibble, we can create our box plots with annotations:
Finally, we can create a chart with asterisks corresponding to ranges of adjusted p-values from the ANOVA:

Or with adjusted p-values from the ANOVA test:
The plot:

In the case of the Kruskal-Wallis test, after computing the p-values, you can immediately add missing columns through mutate():
And the final plot:

We can also use as annotations the results of the post hoc test. Let's compute the Tukey HSD post hoc test as an example (code would be similar in the case of the Dunn post hoc):
In the next step, from the tibble containing post hoc results for all lipids, we select rows concerning the lipids we would like to plot:
Using add_xy_position(), we can add to the tibble with post hoc results columns that will be used by stat_pvalue_manual() for plotting labels (labels' positions):
Next, let's create the plot with labels:
Our plot:

However, the labels overlay. We can fix it but adjusting the y-positions in the tibble 'tukey.hsd.selected'. Let's take a look at the tibble first:

The y-positions of labels annotating significant outcomes from the Tukey HSD test are too close (red frames). The labels for every lipid are arranged in the same order - look at group1 and group2 columns (highlighted in green, orange, and violet frames). This allows us to use the seq() function to select particular rows from the tibble in the y.position column:
Using this short line of code, we select every third row starting from row no 3. and continuing until row no. 12 in the column y.position. In fact, we select single cells corresponding to labels connecting PAN and T. See the output from the R console after running this line of code:
We can modify these entries, e.g., shift them 6-7 positions higher:
We can also gently shift the N vs T labels, e.g., by two units up. We will now start modifications to every third entry from row no. 2:
NOTE: If you would like to apply changes to all entries, e.g. in the 'tukey.hsd' tibble, in the seq() set the argument to = nrows(<tibble_name>).
If you would like to select y.positions of your preference - you can also substitute the y.position values from add_xy_position() using:
If we now run the plot code, we obtain these beautiful box plots:

Or we can also plot annotations above bar charts, as in the example below:

Publication-ready box plots with annotations from the ggstatsplot library
Elegant and publication-ready box plots with detailed annotations from hypothesis testing ( can be produced using ggstatsplot library's functions. Check the subchapter Basic plotting in R - Box plots to recall how to use this library.
Adding to ggplot2 charts annotations based on PMCMRplus package pairwise-comparisons (advanced)
If to the tibble with posthoc results obtained from looped PMCMRplus package functions, we add columns: xmin, xmax, y.position, and significance symbols, then using stat_pvalue_manual(), we can add statistical annotations in the ggplot2 charts computed through PMCMRplus package. Take a look at the code blocks below. As an example, we will use 'posthoc.Conover' tibble computed in the subchapter Multi-sample comparisons in R.
The glimpse:

Now, let's filter out the results of the posthoc testing for lipids that we would like to plot:
And finally, we create the plot:
The plot:

Gentle corrections of y-positions (y.position column in 'posthoc.Conover' tibble):
And we obtain these beautiful box plots with the results of the Conover posthoc test labeled above them:

Or, in this case, we can use adjusted box plots for skewed distributions:
The plot:

Last updated