Omics data visualization in R and Python

Dot plots with ggplot2 and tidyplots

Metabolites and lipids descriptive statistical analysis in R

PreviousScatter plots NextCorrelation heat maps

Last updated 3 months ago

Dot plots with ggplot2 and tidyplots

Metabolites and lipids descriptive statistical analysis in R

A dot plot can serve as an alternative to bar charts or box plots, or it can be combined with either of them. In the case of using dot plots only, to make them more informative and attractive, a mean or median value with standard deviation or interquartile range can be added to the dot plot (these statistics must be computed separately).

Dot plots are highly informative, as each sample is represented by an individual dot, making it difficult to conceal any potential issues with the collected data. For this reason, they are often added as jitter to enhance box plots, offering deeper insight into specific data, as presented in the following manuscripts:

O. Vvedenskaya et al. Nonalcoholic fatty liver disease stratification by liver lipidomics. DOI: - Fig. 1 (dot plots in combination with box plots were used here to visualize sums of lipid species concentrations in each detected lipid class, and compare them across four studied groups).
R. Tabassum et al. Genetic architecture of human plasma lipidome and its link to cardiovascular disease. DOI: - Fig. 2b, c; Fig. 5 (upper part of panel), Fig. 6b, and more (many different variations of dot plots in combination with box plots (and not only) were used here to visualize lipidomics data).
S. Salihovic et al. Identification and validation of a blood- based diagnostic lipidomic signature of pediatric inflammatory bowel disease. DOI: - Fig. 4a (dot plot was used to present the log-transformed unit variance scaled distribution of LacCer(d18:1/16:0) and PC(18:1/22:6) in individuals with IBD compared to symptomatic controls in the validation cohort; in the background, a box plot is implemented in the background to provide more information within one visualization).

However, classic dot plots are also often used, e.g.:

R. Jirásko et al. Altered Plasma, Urine, and Tissue Profiles of Sulfatides and Sphingomyelins in Patients with Renal Cell Carcinoma. DOI: - Fig. 1C (classic dot plots were used to present Age and BMI distributions across patients and controls in three sample types).
R. C. Prior, A. Silva & T. Vangansewinkel et al. PMP22 duplication dysregulates lipid homeostasis and plasma membrane organization in developing human Schwann cells. DOI: - Fig. 5 B - C, and more (the authors utilize classic dot plots to present the results of free cholesterol staining and flow cytometry-based experiments).
A. Talebi et al. Pharmacological induction of membrane lipid poly-unsaturation sensitizes melanoma to ROS inducers and overcomes acquired resistance to targeted therapy. DOI: - Fig. 1b, e, f (the authors employ different variants of dot plots for presenting, e.g., gene expression data).
F. Torta et al. Concordant inter-laboratory derived concentrations of ceramides in human plasma reference materials via authentic standards. DOI: - Fig. 1, Fig. 3, Fig. 4, Fig. 5, Fig. 6 (the authors broadly use classic dot plots for presenting inter-laboratory derived concentrations of ceramides in human plasma, e.g., consensus concentrations of ceramides in NIST SRM 1950 across all participating laboratories, and more).

In the case of ggplot2, the geom_point() is used to generate a layer with points corresponding to all observations. We jitter the points and adjust their look (shape, size, and color).

The points representing the mean/median values are also created by geom_point(), but aesthetics mappings must be changed here. To the second geom_point(), we supply a tibble with mean/median values, standard deviations/interquartile ranges, labels Label, and lipid species Lipids. We customize the points representing mean/median values, i.e. their shape, size, color, filling, etc.

Finally, we add error bars through geom_errorbar(). As we do not want to inherit the aesthetics from the parent plot, we set inherit.aes to FALSE. More about consequences of setting inherit.aes to FALSE you can learn here:

We again indicate that we will use a tibble containing mean values and standard deviations (data = mean_sd, ...). As we do not want to inherit the parent plot's aesthetics, we again need to define x which will be taken from Label, ymin (lower error bar), that is mean - sd, and the ymax (upper error bar) - mean + sd. If these three parameters are not specified by you, running the code will result in an error.

Finally, we use the trick we showed you in the subchapter about box plots - to plot all dot plots in one x-y plane (through facet_grid() and theme().

The final block of code:

# Computing additional statistics to be plotted with our dot plot:
mean_sd <- 
  data %>%
  select(`Label`,
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Label, Lipids) %>%
  summarise(mean = mean(Concentrations),
            sd = sd(Concentrations))

# Creating a dot plot using ggplot2:
data %>%
  select(`Label`,
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  ggplot(aes(x = Label, 
             y = Concentrations, 
             fill = Label))+
  geom_point(position = position_jitter(width = 0.2),
             shape = 21,
             size = 2,
             color = 'black') +
  scale_fill_manual(values = c('royalblue', 'orange','red2')) +
  geom_errorbar(data = mean_sd, 
                aes(x = Label, 
                    ymin = mean - sd, 
                    ymax = mean + sd), 
                width = 0.4,
                linewidth = 0.75,
                inherit.aes = F) +
  geom_point(data = mean_sd, 
             aes(x = Label, 
                 y = mean),
             shape = 23,
             size = 4,
             color = 'black',
             fill = 'red') +
  facet_grid(. ~ Lipids, switch = "x") + 
  theme(strip.placement = "outside",
        strip.background = element_blank(),
        panel.spacing.x = unit(0, 'cm'))

The final plot:

Dot plots with tidyplots (level: basic) - alternative to ggplot2

The tidyplots R library enables the generation of publication-ready plots with only a few lines of code. In this case, it is also a handy alternative to ggplot2. Look at the block code below:

# Selecting lipids or metabolites of interest from the tibble `data`.
# A long tibble is created in the next step:
data.long <- 
  data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations')
               
# Use of the long tibble `data.long` for creating a dot plot
dot.plot <- 
  data.long %>%
  tidyplot(x = Lipids, y = Concentrations, color = Label) %>%
  add_data_points_jitter(alpha = 0.2) %>%
  add_mean_dot() %>%
  add_sd_errorbar() %>%
  adjust_colors(new_colors = c("royalblue", "orange", "red2")) %>%    
  adjust_title("SM 41:1;O2 concentrations in serum of N, PAN, and T") %>%
  adjust_x_axis_title("Health status") %>%
  adjust_y_axis_title("Concentration of SM 41:1;O2") %>%
  adjust_legend_title("Experimental group") %>%
  adjust_caption("Serum lipid concentrations measured via SFC/MS") %>%
  adjust_size(width = 100, height = 60)
  
# Take a look at the plot by typing in the console (execute):
dot.plot
  
# Look how simple this code is!
# 1) Create a tidyplot object,
# 2) Add all the layers you want (tidyplots performs computations for you!),
# 3) Adjust them according to your preference!
# You work with very intuitive functions and produce a beautiful plot!

# Export a high-quality publication-ready dot plot.
# Install package for creating a preview of a plot:
install.packages("ggimage")

# Activate library:
library(ggimage)

## Generate a preview and optimize the plot presentation (your dot plot):
ggpreview(plot = dot.plot,               # The object that you want to preview.
          width = 300,               # Width in px.
          height = 180,              # Height in px.
          units = "px",              # Unit - of size - px.
          dpi = 300,                 # Sharpness.
          scale = 6)            # You may need to use a different scale.


## Save the plot in the working directory using ggsave (ggplot2 package - tidyverse):
ggsave(plot = dot.plot,    # The R object to be saved.        
       device = "jpeg",           # Format.
       filename = "dot_plot.jpeg",
       width = 300,
       height = 180,
       units = "px",
       dpi = 300,
       scale = 6)

The obtained (publication-ready!) plot: