Volcano plots

Metabolites and lipids univariate statistical analysis in R

Volcano plots are broadly used to present outcomes from hypothesis testing on OMICs data. The interpretation of volcano plots is very straightforward - a volcano plot is nothing else but a subtype of scatter plot presenting statistical significance (p-values) versus a magnitude of change - most frequently fold change. The values in the left and right upper corners contain the lipids or metabolites with the most pronounced differences between the groups. Fold change is usually computed as a ratio of both groups' means. For heavily skewed data, you can also compute the ratio of medians and use a nonparametric test for comparing groups: Mann-Whitney U test for unpaired samples or Wilcoxon rank sum for paired study designs. It is important to understand that the volcano plot allows only presenting outcomes from the two group comparisons (e.g., t-test, Welch t-test, or Mann-Whitney U test).

R offers multiple libraries for producing volcano plots. Here, we will show you a great Bioconductor package called EnhancedVolcano for creating publication-ready volcano plots.

Practical applications of volcano plots in lipidomics and metabolomics (examples)

Volcano plots are one of the most commonly used methods for visualizing differentiating lipids and metabolites between two experimental groups. Explore the following manuscripts that utilize volcano plots:

Volcano plots in EnhancedVolcano

Let's begin with the installation of EnhancedVolcano. This produce is slightly different as EnhancedVolcano is a Bioconductor package:

# Installation of EnhancedVolcano (highlight and run all lines):
# Works with R 4.3
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("EnhancedVolcano")

# Browsing package vignette:
browseVignettes("EnhancedVolcano")

# Consulting the documentation:
?EnhancedVolcano()

The EnhancedVolcano() function requires, as a minimum:

  • a table with test statistics (argument toptable),

  • a column in toptable with names of lipids/metabolites (argument lab),

  • a column in toptable containing log2 fold change (argument x),

  • and a column in toptable with raw or adjusted p-values from a statistical test (argument y).

Let's create the most basic volcano plot. First, we will compute Welch's t-test and then adjust the p-values using the Benjamini-Hochberg approach. Next, we will obtain mean-based fold change. Finally, we will produce a basic volcano plot through EnhancedVolcano(). Using xlab, ylab, and title, we will add x and y-axis labels and plot title and subtitle. Look at the code block below:

The plot:

Basic volcano plot generated using EnhancedVolcano package.

Through the caption argument, you can also specify a plot caption. Using axisLabSize, titleLabSize, subtitleLabSize, captionLabSize - the size of the axis titles, plot title, subtitle, and caption can be changed.

However, our axes present -log10(Welch p-value) and log2(fold change). Using the bquote() function, we can add math notations to our plot in the x-y axes labels. Just copy the bquote() expressions provided by us to xlab and ylab:

The modified labels:

Using bquotes() to create math annotations in x-y axes labels.

More about using bquote() for creating math annotations can be found in the documentation:

And on r-bloggers:

Introduction to bquote() in R.

The basic rule is that strings are always in quotes ("..."), wrapped with a tilde separator: "Hello"~. Subscripts are in [...]: log[2], and power of is denoted by ^, e.g. 10^4. If you want to put a variable in the label - use .(). More about bquote() in the sources mentioned above.

Let's customize this plot. The EnhancedVolcano() contains numerous arguments allowing for flexible modifications.

First, let's change the cut-off points first. Via pCutoff a p-value cut-off point can be changed, and a horizontal line is drawn. Authors of the package also take into consideration adjusted p-values. If as y unadjusted p-values are delivered, via pCutoffCol adjusted p-values can be supplied. In this case, the volcano plot relies on -log10(unadjusted p-value), but the p-value cut-offs are set using pCutoffCol. For fold change, the cut-off lines are based on FCcutoff, and vertical lines are added at positive and negative cut-offs. The look of the lines can be changed via:

  • cutoffLineType - we can select: 'blank', 'solid', 'dashed', 'dotted', 'dotdash', 'longdash', 'twodash'.

  • cutoffLineCol - used to change the color of FC and p-value cut-off lines.

  • cutoffLineWidth - used to change the width of cut-off lines.

Let's use these arguments:

Our volcano after changes:

Changing p-value/FC cut-off points and the look of cut-off lines.

Changing the cut-off points created a new group in the volcano: variables significant according to p-value and fold change (red points). For this group, EnhancedVolcano() automatically added lipid names as labels. However, these labels overlap, making it difficult to understand which points are actually annotated. We can modify them through:

  • labSize - changes the size of all labels,

  • labCol - changes the color of all labels,

  • and labFace - changes the font face of all labels.

The labels can be drawn in boxes - you can set boxedLabels to TRUE.

To avoid ambiguous annotations, connections can be drawn between points and labels. To do so, set drawConnections to TRUE. You can change the look of connectors using:

  • widthConnectors - changes the line width of connectors,

  • min.segment.length - is used to specify the minimum length of connector line segments,

  • directionConnectors - direction for drawing connectors: can be set to 'both', 'x', 'y',

  • typeConnectors - used to set if the arrowhead of connectors should be open or filled,

  • endsConnectors - used to define which end of the connector should have the arrowhead: can be set to 'last', 'first', 'both',

  • lengthConnectors - changes the arrowhead size (length),

  • colConnectors - changes the color of connectors and connector line segments.

If you want to remove arrowheads, set arrowheads to FALSE. This way, you will only keep straight lines. To avoid overlaps in labels, you can define max.overlaps (for Inf - presents all labels, for 0 - presents no labels). We select some of these options and use them to improve the readability of our volcano plot:

The plot:

Changing the look and readability of labels of the EnhancedVolcano().

NOTE: Find appropriate settings for your volcano plot to present labels. Remember, it is good to label as many variables as possible, but ambiguous and overlapping annotations are not useful for readers of your manuscript. Therefore, first, focus on labeling the most important variables - in the left and right upper corners. Keep the annotations only if it's possible to understand to what point they correspond easily - as in the example above. A plot crowded with ambiguous annotations should be appropriately adjusted and corrected. If it is impossible to do this easily in R or Python, then you can use any software for processing graphics.

Now, let's change the size, colors, and shapes of points corresponding to all measured lipids. For this purpose, we can use:

  • pointSize argument for changing the size of points; you can specify a number or a vector of sizes,

  • col - by default, it corresponds to groups proposed in the legend by authors: NS, Log2FC, p-value, and p-value and Log2FC; using col, four new colors can be selected,

  • colCustom can be used to override the default color scheme through a vector containing new colors; remember that the order of new colors in the vector used to override default settings must match the order of variables in the toptable,

  • colAlpha - is used to change the opacity of points,

  • colGradient - if it is active, then the discrete scale of colors is substituted by a continuous scale of colors; define, e.g., c('red2', 'royalblue'), to see the effect,

  • colGradientBreaks - breakpoints for the colors specified in colGradient,

  • colGradientLabels - labels for breakpoints,

  • colGradientLimits - limits for the color scheme proposed in colGradient,

  • shape - defines shape for the plotted points; for the default setting of four groups - expects a maximum of four different shapes,

  • shapeCustom - works similarly to colCustom, and it can be used to override the default setting of shapes; remember that the order of new shapes in the vector must match the order of variables in the toptable!

Let's use these arguments to improve the look of our plot. First, we will increase the size of points and the default four colors:

We obtain:

Changing the size and colors of points to highlight statistical significance.

We can change the groups and select our criteria for colors and shapes, e.g., let's begin by coloring red2 significant upregulations in T and royalblue significant downregulations in T. We must create a vector containing colors in the same order as our variables in the toptable. The column names in this vector should be the new groups we want to introduce. Such a vector can be delivered to colCustom to adjust colors according to preferences.

Let's create three groups with three distinct colors: "Significant upregulation" (red2), "Significant downregulation" (royalblue), and "Insignificant results" (gray).

We will use ifelse() function to create the new vector with colors.

First, we need to interpret the outcomes for every variable in the 'Welch.test' tibble, i.e., every p.adj and every fold change, and define if, for a particular variable, we observe significant up- or downregulation or there is no trend at all:

Half-way there! Now, as we know, if a variable is significantly up- or downregulated and we have this result stored as a vector, we can set colors for every entry (here: red2 /royalblue). All insignificant outcomes will be colored gray. We now create the vector with color names, which will be delivered to the EnhancedVolcano() function. We will again use the nested ifelse():

For every entry in new.criteria vector named "Significant upregulation" - we set the "red2", for "Significant downregulation - "royalblue", for none of these outcomes, we apply "grey". The outcomes from the nested ifelse() are stored as 'new.colors' vector.

Now, we have our vector with colors, which can be delivered to the colCustom. However, in the vector new.colors, above every color, we need to put a column name matching new groups. In our case: above every red2 - "Significant upregulation", above every royalblue - "Significant downregulation", and for all grey - "Insignificant". A simple way to do this can be found below:

A glimpse at the vector before delivering it to colCustom:

The vector new.colors containing grouping information with corresponding colors.

And the updated plot code:

The updated plot:

Customizing colors in the EnhancedVolcano() through colCustom argument and a vector filled with colors.

We can proceed similarly with shapes, but let's change the shape depending on the lipid class. First, we need to extract every lipid class from the Lipids column. Here, we will use a tailored stringr solution to tackle this issue. The library stringr is a part of the tidyverse collection:

Now, for every lipid class in the 'Lipid.class' tibble, we must select a number corresponding to the ggplot2 type of shape. Since this requires numerous nested ifelse() functions, we will rather use a more elegant dplyr option: mutate() with case_when(). We can use it because our 'Lipid.class' is a tibble, and we will add a new column there called 'new.shapes':

This column can be extracted to create a vector with new shapes for shapeCustom. We need to transpose it and specify that it is a vector:

Now, we need to match our shapes (numbers) with lipid class names (groups):

We are ready to modify our volcano plot:

The plot:

Setting point shapes according to lipid classes.

Specify colors or color codes in the mutate() function with case_when() if you want to color by lipid classes instead of changing the points' shape. Create a vector with new.colors by selecting the appropriate column from 'Lipid.class', transpose it, and match colors with lipid class names:

The new plot:

Changing the color of points according to lipid class.

Finally, we can modify the plot background, axes, and legend:

  • hline - you can add horizontal lines passing through y-axis,

  • hlineType - change horizontal line type,

  • hlineCol - change horizontal line color,

  • hlineWidth - change horizontal line width,

  • vline - add vertical lines passing through x-axis,

  • vlineType - change vertical line type,

  • vlineCol - change vertical line color,

  • vlineWidth - change vertical line width,

  • gridlines.major - logical - whether draw major gridlines,

  • gridlines.minor - logical - whether draw minor gridlines,

  • border - add border for x and y axes ('partial') or around the entire chart ('full'),

  • borderWidth - border width,

  • borderColour - border color,

  • xlim - x-axis limits,

  • ylim - y-axis limits,

  • legendLabels - plot legend text labels,

  • legendPosition - used to set the position of the legend ('bottom', 'top', 'left', 'right'),

  • legendLabSize - size of plot legend text,

  • legendIconSize - size of legend symbols.

The plot:

Customized EnhancedVolcano volcano plot.

Or without any h- and vlines in the background:

The plot:

Customized EnhancedVolcano volcano plot - proposal no. 2.

More volcano plot designs and details about EnhancedVolcano() function can be found in the meticulous vignette prepared by the authors. We encourage you to read it:

Introduction to EnhancedVolcano package.

Creating interactive volcano plots through plotly (advanced)

Using plotly, interactive volcano plots can be generated to facilitate, e.g., data exploratory analysis. In the first step, we again compute the Welch t-test and mean-based fold changes:

Now, before we use plotly to create a volcano plot - we will define four groups:

  • variables with log2(fold change) > 0.3 - Fold change,

  • variables with log2(fold change) > 0.3 and p.adj < 0.05 - Significant & Fold Change,

  • variables with log2(fold change) < 0.3 and p.adj < 0.05 - Significant,

  • and finally - variables with log2(fold change) < 0.03 and p.adj > 0.05 - Not significant.

In effect, we obtain this beautiful interactive volcano chart:

The interactive volcano plot created in R using plotly library.

Last updated