Dot plots with ggplot2 and tidyplots
Metabolites and lipids descriptive statistical analysis in R
Last updated
Metabolites and lipids descriptive statistical analysis in R
Last updated
A dot plot can serve as an alternative to bar charts or box plots, or it can be combined with either of them. In the case of using dot plots only, to make them more informative and attractive, a mean or median value with standard deviation or interquartile range can be added to the dot plot (these statistics must be computed separately).
Dot plots are highly informative, as each sample is represented by an individual dot, making it difficult to conceal any potential issues with the collected data. For this reason, they are often added as jitter to enhance box plots, offering deeper insight into specific data, as presented in the following manuscripts:
O. Vvedenskaya et al. Nonalcoholic fatty liver disease stratification by liver lipidomics. DOI: - Fig. 1 (dot plots in combination with box plots were used here to visualize sums of lipid species concentrations in each detected lipid class, and compare them across four studied groups).
R. Tabassum et al. Genetic architecture of human plasma lipidome and its link to cardiovascular disease. DOI: - Fig. 2b, c; Fig. 5 (upper part of panel), Fig. 6b, and more (many different variations of dot plots in combination with box plots (and not only) were used here to visualize lipidomics data).
S. Salihovic et al. Identification and validation of a blood- based diagnostic lipidomic signature of pediatric inflammatory bowel disease. DOI: - Fig. 4a (dot plot was used to present the log-transformed unit variance scaled distribution of LacCer(d18:1/16:0) and PC(18:1/22:6) in individuals with IBD compared to symptomatic controls in the validation cohort; in the background, a box plot is implemented in the background to provide more information within one visualization).
However, classic dot plots are also often used, e.g.:
R. Jirásko et al. Altered Plasma, Urine, and Tissue Profiles of Sulfatides and Sphingomyelins in Patients with Renal Cell Carcinoma. DOI: - Fig. 1C (classic dot plots were used to present Age and BMI distributions across patients and controls in three sample types).
R. C. Prior, A. Silva & T. Vangansewinkel et al. PMP22 duplication dysregulates lipid homeostasis and plasma membrane organization in developing human Schwann cells. DOI: - Fig. 5 B - C, and more (the authors utilize classic dot plots to present the results of free cholesterol staining and flow cytometry-based experiments).
A. Talebi et al. Pharmacological induction of membrane lipid poly-unsaturation sensitizes melanoma to ROS inducers and overcomes acquired resistance to targeted therapy. DOI: - Fig. 1b, e, f (the authors employ different variants of dot plots for presenting, e.g., gene expression data).
F. Torta et al. Concordant inter-laboratory derived concentrations of ceramides in human plasma reference materials via authentic standards. DOI: - Fig. 1, Fig. 3, Fig. 4, Fig. 5, Fig. 6 (the authors broadly use classic dot plots for presenting inter-laboratory derived concentrations of ceramides in human plasma, e.g., consensus concentrations of ceramides in NIST SRM 1950 across all participating laboratories, and more).
In the case of ggplot2, the geom_point() is used to generate a layer with points corresponding to all observations. We jitter the points and adjust their look (shape, size, and color).
The points representing the mean/median values are also created by geom_point(), but aesthetics mappings must be changed here. To the second geom_point(), we supply a tibble with mean/median values, standard deviations/interquartile ranges, labels Label
, and lipid species Lipids
. We customize the points representing mean/median values, i.e. their shape, size, color, filling, etc.
Finally, we add error bars through geom_errorbar(). As we do not want to inherit the aesthetics from the parent plot, we set inherit.aes to FALSE. More about consequences of setting inherit.aes to FALSE you can learn here:
We again indicate that we will use a tibble containing mean values and standard deviations (data = mean_sd, ...). As we do not want to inherit the parent plot's aesthetics, we again need to define x which will be taken from Label
, ymin (lower error bar), that is mean - sd, and the ymax (upper error bar) - mean + sd. If these three parameters are not specified by you, running the code will result in an error.
Finally, we use the trick we showed you in the subchapter about box plots - to plot all dot plots in one x-y plane (through facet_grid() and theme().
The final block of code:
The final plot:
The tidyplots R library enables the generation of publication-ready plots with only a few lines of code. In this case, it is also a handy alternative to ggplot2. Look at the block code below:
The obtained (publication-ready!) plot: