💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  1. Metabolites and lipids descriptive statistical analysis in R
  2. Basic plotting in R

Dot plots with ggplot2 and tidyplots

Metabolites and lipids descriptive statistical analysis in R

PreviousScatter plotsNextCorrelation heat maps

Last updated 3 months ago

A dot plot can serve as an alternative to bar charts or box plots, or it can be combined with either of them. In the case of using dot plots only, to make them more informative and attractive, a mean or median value with standard deviation or interquartile range can be added to the dot plot (these statistics must be computed separately).

Dot plots are highly informative, as each sample is represented by an individual dot, making it difficult to conceal any potential issues with the collected data. For this reason, they are often added as jitter to enhance box plots, offering deeper insight into specific data, as presented in the following manuscripts:

  • O. Vvedenskaya et al. Nonalcoholic fatty liver disease stratification by liver lipidomics. DOI: - Fig. 1 (dot plots in combination with box plots were used here to visualize sums of lipid species concentrations in each detected lipid class, and compare them across four studied groups).

  • R. Tabassum et al. Genetic architecture of human plasma lipidome and its link to cardiovascular disease. DOI: - Fig. 2b, c; Fig. 5 (upper part of panel), Fig. 6b, and more (many different variations of dot plots in combination with box plots (and not only) were used here to visualize lipidomics data).

  • S. Salihovic et al. Identification and validation of a blood- based diagnostic lipidomic signature of pediatric inflammatory bowel disease. DOI: - Fig. 4a (dot plot was used to present the log-transformed unit variance scaled distribution of LacCer(d18:1/16:0) and PC(18:1/22:6) in individuals with IBD compared to symptomatic controls in the validation cohort; in the background, a box plot is implemented in the background to provide more information within one visualization).

However, classic dot plots are also often used, e.g.:

  • R. Jirásko et al. Altered Plasma, Urine, and Tissue Profiles of Sulfatides and Sphingomyelins in Patients with Renal Cell Carcinoma. DOI: - Fig. 1C (classic dot plots were used to present Age and BMI distributions across patients and controls in three sample types).

  • R. C. Prior, A. Silva & T. Vangansewinkel et al. PMP22 duplication dysregulates lipid homeostasis and plasma membrane organization in developing human Schwann cells. DOI: - Fig. 5 B - C, and more (the authors utilize classic dot plots to present the results of free cholesterol staining and flow cytometry-based experiments).

  • A. Talebi et al. Pharmacological induction of membrane lipid poly-unsaturation sensitizes melanoma to ROS inducers and overcomes acquired resistance to targeted therapy. DOI: - Fig. 1b, e, f (the authors employ different variants of dot plots for presenting, e.g., gene expression data).

  • F. Torta et al. Concordant inter-laboratory derived concentrations of ceramides in human plasma reference materials via authentic standards. DOI: - Fig. 1, Fig. 3, Fig. 4, Fig. 5, Fig. 6 (the authors broadly use classic dot plots for presenting inter-laboratory derived concentrations of ceramides in human plasma, e.g., consensus concentrations of ceramides in NIST SRM 1950 across all participating laboratories, and more).

In the case of ggplot2, the geom_point() is used to generate a layer with points corresponding to all observations. We jitter the points and adjust their look (shape, size, and color).

The points representing the mean/median values are also created by geom_point(), but aesthetics mappings must be changed here. To the second geom_point(), we supply a tibble with mean/median values, standard deviations/interquartile ranges, labels Label, and lipid species Lipids. We customize the points representing mean/median values, i.e. their shape, size, color, filling, etc.

Finally, we add error bars through geom_errorbar(). As we do not want to inherit the aesthetics from the parent plot, we set inherit.aes to FALSE. More about consequences of setting inherit.aes to FALSE you can learn here:

We again indicate that we will use a tibble containing mean values and standard deviations (data = mean_sd, ...). As we do not want to inherit the parent plot's aesthetics, we again need to define x which will be taken from Label, ymin (lower error bar), that is mean - sd, and the ymax (upper error bar) - mean + sd. If these three parameters are not specified by you, running the code will result in an error.

Finally, we use the trick we showed you in the subchapter about box plots - to plot all dot plots in one x-y plane (through facet_grid() and theme().

The final block of code:

# Computing additional statistics to be plotted with our dot plot:
mean_sd <- 
  data %>%
  select(`Label`,
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Label, Lipids) %>%
  summarise(mean = mean(Concentrations),
            sd = sd(Concentrations))

# Creating a dot plot using ggplot2:
data %>%
  select(`Label`,
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  ggplot(aes(x = Label, 
             y = Concentrations, 
             fill = Label))+
  geom_point(position = position_jitter(width = 0.2),
             shape = 21,
             size = 2,
             color = 'black') +
  scale_fill_manual(values = c('royalblue', 'orange','red2')) +
  geom_errorbar(data = mean_sd, 
                aes(x = Label, 
                    ymin = mean - sd, 
                    ymax = mean + sd), 
                width = 0.4,
                linewidth = 0.75,
                inherit.aes = F) +
  geom_point(data = mean_sd, 
             aes(x = Label, 
                 y = mean),
             shape = 23,
             size = 4,
             color = 'black',
             fill = 'red') +
  facet_grid(. ~ Lipids, switch = "x") + 
  theme(strip.placement = "outside",
        strip.background = element_blank(),
        panel.spacing.x = unit(0, 'cm'))

The final plot:

Dot plots with tidyplots (level: basic) - alternative to ggplot2

The tidyplots R library enables the generation of publication-ready plots with only a few lines of code. In this case, it is also a handy alternative to ggplot2. Look at the block code below:

# Selecting lipids or metabolites of interest from the tibble `data`.
# A long tibble is created in the next step:
data.long <- 
  data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations')
               
# Use of the long tibble `data.long` for creating a dot plot
dot.plot <- 
  data.long %>%
  tidyplot(x = Lipids, y = Concentrations, color = Label) %>%
  add_data_points_jitter(alpha = 0.2) %>%
  add_mean_dot() %>%
  add_sd_errorbar() %>%
  adjust_colors(new_colors = c("royalblue", "orange", "red2")) %>%    
  adjust_title("SM 41:1;O2 concentrations in serum of N, PAN, and T") %>%
  adjust_x_axis_title("Health status") %>%
  adjust_y_axis_title("Concentration of SM 41:1;O2") %>%
  adjust_legend_title("Experimental group") %>%
  adjust_caption("Serum lipid concentrations measured via SFC/MS") %>%
  adjust_size(width = 100, height = 60)
  
# Take a look at the plot by typing in the console (execute):
dot.plot
  
# Look how simple this code is!
# 1) Create a tidyplot object,
# 2) Add all the layers you want (tidyplots performs computations for you!),
# 3) Adjust them according to your preference!
# You work with very intuitive functions and produce a beautiful plot!

# Export a high-quality publication-ready dot plot.
# Install package for creating a preview of a plot:
install.packages("ggimage")

# Activate library:
library(ggimage)

## Generate a preview and optimize the plot presentation (your dot plot):
ggpreview(plot = dot.plot,               # The object that you want to preview.
          width = 300,               # Width in px.
          height = 180,              # Height in px.
          units = "px",              # Unit - of size - px.
          dpi = 300,                 # Sharpness.
          scale = 6)            # You may need to use a different scale.


## Save the plot in the working directory using ggsave (ggplot2 package - tidyverse):
ggsave(plot = dot.plot,    # The R object to be saved.        
       device = "jpeg",           # Format.
       filename = "dot_plot.jpeg",
       width = 300,
       height = 180,
       units = "px",
       dpi = 300,
       scale = 6)

The obtained (publication-ready!) plot:

https://doi.org/10.1016/j.jlr.2021.100104
https://doi.org/10.1038/s41467-019-11954-8
https://doi.org/10.1038/s41467-024-48763-7
https://doi.org/10.3390/cancers14194622
https://doi.org/10.1093/brain/awae158
https://doi.org/10.1186/s13046-023-02664-7
https://doi.org/10.1038/s41467-024-52087-x
Dot plots created using ggplot2 package.
Publication-ready dot plot created using tidyplots R library.
ggplot2: Elegant Graphics for Data Analysis (3e) - 18  Programming with ggplot2
Annotations and the inherit.aes = FALSE application.
Logo