Results of tests as annotations in the charts

Metabolites and lipids univariate statistical analysis in R

Adding annotations for statistical significance above bar charts, box plots, and dot plots has become standard practice in lipidomics and metabolomics articles. Check out the following manuscripts:

Annotations corresponding to statistical significance can be added to ggpubr and ggplot2 charts through the stat_pvalue_manual() function from the ggpubr library. The application of this function is straightforward if hypothesis testing is performed using functions from rstatix library. These functions create tibbles with defined column names, which are expected by the stat_pvalue_manual() - look at the 'data' argument in the documentation of this function. Additionally, we need to define the type of label - whether it should be a p-value, asterisks corresponding to p-value ranges, adjusted p-values, or others. Take a look at the code blocks below to understand the stat_pvalue_manual() function.

First, we will begin with t-test annotations. We create a long tibble and compute t-tests for all features.

# Adding annotations from hypothesis testing to ggpubr / ggplot2 charts.
# First, create a long tibble:
data.long <- 
  data %>%
  select(-`Sample Name`) %>%
  filter(Label != "PAN") %>%
  pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  droplevels() # We ensure the "PAN" label is removed from grouping variable
               
# Next, compute a t-test between N and T, adjust p-values, and add asterisks for p.adj:
t.test <-
  data.long %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label) %>%
  adjust_pvalue(method = "BH") %>%
  add_significance(p.col = "p.adj") # The asterisks will correspond to the adjusted p-value ranges!

print(t.test)

# This is an example of tibble that stat_pvalue_manual() needs - take a look at it.

We obtain the following tibble:

The tibble with t-test results and column names expected by the stat_pvalue_manual().

As our tibble with results is ready, we need to perform one more operation. To add annotations, we need to define their x and y positions in the plot. The ggpubr provides a function add_xy_position() that can do this for you.

Let's begin with a simple plot for one selected lipid, e.g., SM 41:1;O2. We filter the 't.test' tibble by rows to keep t-test results only for SM 41:1;O2. Next, we add x and y positions. In the add_xy_position(), we specify scales = "free" to make sure positions are exclusive for every annotation.

We obtain in the R console:

Now, let's create our first chart with annotations - let's begin with a raw p-value. The updated 't.test' tibble contains now only information for one lipid - SM 41:1;O2. In the stat_pvalue_manual(), we make sure the ggplot2 aesthetics are not inherited (inherit.aes set to FALSE), and that the label will be raw p-value - label is taken from the 'p' column of 't.test' tibble (label = "p").

Let's create box plots for this lipid with annotations:

In effect, we obtain these box plots:

Box plots with a raw p-value annotated above through stat_pvalue_manual() from the ggpubr.

We can of course easily change the label type to the adjusted p-value or asterisks denoting statistical significance:

We obtain:

Box plots for SM 41:1;O2 with t-test results added as annotations: in the left panel - Benjamini-Hochberg adjusted p-value, in the right panel with asterisks corresponding to the adjusted p-value ranges.

Similar plots can be easily constructed through the ggpubr library. In this case, you do not need to specify inherit.aes = F:

We obtain:

The ggpubr box plots with annotations from hypothesis testing added through stat_pvalue_manual().

Now, let's move to multi-panel plots created, e.g., via facet_grid(). Here, it is crucial that you specify scales = "free" in the add_xy_position(). Otherwise, the add_xy_position() function will add non-matching y-positions. Take a look at the example below:

We obtain:

The multi-panel ggplot2 box plots with annotations from the t-test added using stat_pvalue_manual() with free scales.

We would proceed similarly if we would like to place annotations above box plots plotted in one x-y plane (using facet_grid()). See the code below:

The output:

The ggplot2 box plots plotted in one x-y plane through facet_grid(). Raw p-values were added through stat_pvalue_manual().

Using long tibble with data, we can recreate these plots through the ggpubr, too:

The output:

The ggpubr box plots plotted in one x-y plane through facet_grid(). Raw p-values were added through stat_pvalue_manual().

Suppose you want to change the annotation labels' height above box plots. The most efficient way is to check the current y.position in your tibble with statistical test results and modify it. Take a look at this example:

We obtain in the R console:

Comparing the current position of the labels with these values, we see that the labels could be slightly lifted. We can use the mutate() function from the dplyr package to change these values:

We obtain in the R console:

And the updated plot:

The plot:

Adjusting the y.position through mutate - lifting the annotations up by three units.

Now, let's use the omnibus test results to create annotations - e.g., from the ANOVA. In the first step, we compute the test for all variables, adjust p-values for multiple comparisons, and add significance symbols:

According to the documentation, anova_test() returns an object of class anova_test - a data frame containing the ANOVA table for the basic type of ANOVA test. We change it into a tibble to simplify further steps:

We round all p-values in the new 'ANOVA' tibble using p_round():

Now, if you check the output tibble from the last line of code - it still does not contain the following columns: group1, group2, xmin, xmax, and y.position. These columns are necessary to create an annotation by stat_pvalue_manual() and are created by add_xy_position(). Using mutate(), we can easily add all of them. If you want to use add_xy_position(), first, you must add group1 and group2 via mutate().

The group1 should contain N and group2 - T. The xmin and xmax define the annotation position on the x-axis. The y.position is related to the label height above box plots, bar charts, dot plots, etc. The y.position can be estimated based on the maximum concentration measured for each lipid, e.g., the label can be placed at a 20% higher value than the highest value of concentration measured. It is all achieved through the following code lines:

Now, having all the necessary columns in the ANOVA tibble, we can create our box plots with annotations:

Finally, we can create a chart with asterisks corresponding to ranges of adjusted p-values from the ANOVA:

Box plots presenting concentrations of selected lipids with asterisk annotations corresponding to adjusted p-value ranges from the ANOVA test across all three groups - N, PAN, and T.

Or with adjusted p-values from the ANOVA test:

The plot:

Box plots presenting concentrations of selected lipids with adjusted p-values from the ANOVA test across all three groups - N, PAN, and T.

In the case of the Kruskal-Wallis test, after computing the p-values, you can immediately add missing columns through mutate():

And the final plot:

Box plots presenting concentrations of selected lipids with adjusted p-values from the Kruskal-Wallis test across all three groups - N, PAN, and T.

We can also use as annotations the results of the post hoc test. Let's compute the Tukey HSD post hoc test as an example (code would be similar in the case of the Dunn post hoc):

In the next step, from the tibble containing post hoc results for all lipids, we select rows concerning the lipids we would like to plot:

Using add_xy_position(), we can add to the tibble with post hoc results columns that will be used by stat_pvalue_manual() for plotting labels (labels' positions):

Next, let's create the plot with labels:

Our plot:

Adding post-hoc test results as annotations above box plots.

However, the labels overlay. We can fix it but adjusting the y-positions in the tibble 'tukey.hsd.selected'. Let's take a look at the tibble first:

A glimpse at the 'tukey.hsd.selected' tibble.

The y-positions of labels annotating significant outcomes from the Tukey HSD test are too close (red frames). The labels for every lipid are arranged in the same order - look at group1 and group2 columns (highlighted in green, orange, and violet frames). This allows us to use the seq() function to select particular rows from the tibble in the y.position column:

Using this short line of code, we select every third row starting from row no 3. and continuing until row no. 12 in the column y.position. In fact, we select single cells corresponding to labels connecting PAN and T. See the output from the R console after running this line of code:

We can modify these entries, e.g., shift them 6-7 positions higher:

We can also gently shift the N vs T labels, e.g., by two units up. We will now start modifications to every third entry from row no. 2:

NOTE: If you would like to apply changes to all entries, e.g. in the 'tukey.hsd' tibble, in the seq() set the argument to = nrows(<tibble_name>).

If you would like to select y.positions of your preference - you can also substitute the y.position values from add_xy_position() using:

If we now run the plot code, we obtain these beautiful box plots:

The ggpubr box plots with post hoc test results added through stat_pvalue_manual().

Or we can also plot annotations above bar charts, as in the example below:

The ggplot2 bar plots with post hoc test results added through stat_pvalue_manual().

Publication-ready box plots with annotations from the ggstatsplot library

Elegant and publication-ready box plots with detailed annotations from hypothesis testing ( can be produced using ggstatsplot library's functions. Check the subchapter Basic plotting in R - Box plots to recall how to use this library.

Adding to ggplot2 charts annotations based on PMCMRplus package pairwise-comparisons (advanced)

If to the tibble with posthoc results obtained from looped PMCMRplus package functions, we add columns: xmin, xmax, y.position, and significance symbols, then using stat_pvalue_manual(), we can add statistical annotations in the ggplot2 charts computed through PMCMRplus package. Take a look at the code blocks below. As an example, we will use 'posthoc.Conover' tibble computed in the subchapter Multi-sample comparisons in R.

The glimpse:

The 'posthoc.Conover' tibble ready for plotting annotations in the ggplot2 charts.

Now, let's filter out the results of the posthoc testing for lipids that we would like to plot:

And finally, we create the plot:

The plot:

The ggplot2 box plots with Conover posthoc test results (PMCMRplus package) before correcting the height of label.

Gentle corrections of y-positions (y.position column in 'posthoc.Conover' tibble):

And we obtain these beautiful box plots with the results of the Conover posthoc test labeled above them:

The ggplot2 Tukey box plots with Conover posthoc test results (PMCMRplus package).

Or, in this case, we can use adjusted box plots for skewed distributions:

The plot:

The ggplot2 adjusted box plots for skewed distributions with Conover posthoc test results (PMCMRplus package).

Last updated