Bar charts

Metabolites and lipids descriptive statistical analysis in R

Bar plots typically represent a mean value and a measure of dispersion or uncertainty (through error bars), e.g., standard deviation or standard error of the mean. However, other central tendency measures can also be presented using bar plots, e.g., median with interquartile range.

Bar charts are the most straightforward way to depict descriptive statistics as they provide the most basic insight into the data. Therefore, bar charts are one of the most commonly used methods for visualizing the results of biological experiments or validating analytical methods. One could assume that most metabolomics or lipidomics manuscripts contain at least one bar chart. Some of the examples are presented below:

V. de Laat et al. Intrinsic temperature increase drives lipid metabolism towards ferroptosis evasion and chemotherapy resistance in pancreatic cancer. DOI: https://doi.org/10.1038/s41467-024-52978-z e.g., Fig. 1e and 2f (for comparing lipid classes' or epilipids' levels between cell lines).
B. D. McNally et al. Long-chain ceramides are cell non-autonomous signals linking lipotoxicity to endoplasmic reticulum stress in skeletal muscle. DOI: https://doi.org/10.1038/s41467-022-29363-9 e.g., Fig. 1 b-e, Fig. 2 a-i, Fig. 3, etc. (presenting results of biological experiments, e.g. comparing lipid levels - Fig. 3).
V. Matyash et al. Lipid extraction by methyl-tert-butyl ether for high-throughput lipidomics. DOI: https://doi.org/10.1194/jlr.D700041-JLR200 - Fig. 5 A-F.
D. Wolrab et al. Validation of lipidomic analysis of human plasma and serum by supercritical fluid chromatography-mass spectrometry and hydrophilic interaction liquid chromatography-mass spectrometry. DOI: https://doi.org/10.1007/s00216-020-02473-3 - Fig. 3 a-d.
J. Idkowiak et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. DOI: https://doi.org/10.1007/s00216-022-04490-w - Fig. 4A, Fig. 8D (in all three manuscripts, bar charts are used to present results of analytical method validation (methods' parameters), e.g., extraction recoveries, or for comparing lipid concentrations determined in NIST plasma. The last manuscript (Fig. 8D) also depicts the classification accuracy, specificity, and sensitivity using bar charts).
O. Peterka & A. Maccelli et al. HILIC/MS quantitation of low-abundant phospholipids and sphingolipids in human plasma and serum: Dysregulation in pancreatic cancer. DOI: https://doi.org/10.1016/j.aca.2023.342144 - Fig. 2E - (authors used bar charts to present the numbers of identified lipids in two ion modes).
D. Wolrab et al. Plasma lipidomic profiles of kidney, breast and prostate cancer patients differ from healthy controls. DOI: https://doi.org/10.1038/s41598-021-99586-1 - Fig. 3A - H; Fig. 6 (the authors employ bar charts to present classification outcomes from OPLS-DA at the 0.5 threshold - specifically - accuracy, specificity, sensitivity).
A. Talebi et al. Pharmacological induction of membrane lipid poly-unsaturation sensitizes melanoma to ROS inducers and overcomes acquired resistance to targeted therapy. DOI: https://doi.org/10.1186/s13046-023-02664-7 - Fig. 1g, Fig. 2b - d, Fig. 3, Fig. 4 (the authors employ multiple bar charts to illustrate various biological experiment outcomes, such as gene and protein expression, confluence, lipid peroxidation, and cellular/mitochondrial ROS across different biological groups).

Note!

Although bar charts are simple, widely used, and easy to interpret, they offer limited data insights unless paired with additional visualizations like dot plots. Moreover, they should be used cautiously when significant outliers are present, as central tendency measures, such as mean values (most often represented by bars), may be greatly affected. In such cases, alternatives like box plots, violin plots, dot plots, or a combination of bar charts with dot plots should be considered to represent the skewness appropriately.

In the examples below, we will use it to compare mean concentrations of long-chain sphingomyelins in the plasma of healthy controls (N), patients with pancreatitis (PAN), and patients with pancreatic cancer (T). As the measure of spread, we will use standard deviation. Single examples will also depict medians with interquartile ranges.

Preparing bar charts via ggpubr (level: basic)

Plotting of bar charts in the ggpubr is achieved through the ggbarplot() function:

# Calling libraries
library(ggpubr)
library(tidyverse)

# Plotting a simple bar plot:
ggbarplot(data = data,         
          x = "Label", 
          y = "SM 41:1;O2", 
          fill = "Label", 
          add = 'mean')

# Explanations argument by argument:
# 1. data: here you indicate a vector/data frame/tibble - your source of data.
# 2. x: here, you indicate what should be plotted on the x-axis.
# 3. y: here, what should be plotted on y-axis.
# 4. fill: this argument allows you to define:
# a) what color would you like to fill the bars with (one color) -- we don't use this option here. 
# or
# b) grouping variable - each group has a different color of bars' filling.
# 5. add = 'mean' - for concentrations from `SM 41:1;O2` column - computes means for each group.

This simple code produces this plot:

Multiple lipids can be plotted next to each other if a long matrix is used. See the example below:

data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  ggbarplot(x = 'Lipids', 
            y = 'Concentrations', 
            fill = "Label", 
            add = 'mean',
            position = position_dodge(width = 0.8))
            
# Explanations:
# Take 'data' from the global environment,
# Pipe it to select(),
# select: `Label`, `SM 39:1;O2`, `SM 40:1;O2`, `SM 41:1;O2`, `SM 42:1;O2`,
# Pipe it to pivot_longer(),
# Change wide tibble into long one,
# Pipe data to ggbarplot:
# Plot `Concentrations` across lipid species (`Lipids`). 
# Group data (fill bars) by `Label`,
# Compute means for all bars.
# position = position_dodge(width = 0.8) - allows for plotting a grouped bar chart.

In this way, we obtain such an elegant plot:

Important: Without specifying the position argument - a stacked bar chart will be obtained. Position 'dodge' can be also described as plotting bars - side-by-side. Inside of the position_dodge(), we can indicate the parameter - width. The greater the width, the farther apart the bars. We will also explain the customization in more detail in the subchapter 'Customizing ggpubr and ggplot2 charts in R'.

# Without specifying the 'position' - a stacked bar chart will be obtained.
data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  ggbarplot(x = 'Lipids', 
            y = 'Concentrations', 
            fill = "Label", 
            add = 'mean')

The output:

Adding error bars to the bar plot can be achieved through the add argument. Instead of expecting only the mean to be computed, we can compute the mean and standard deviation at once (add = 'mean_sd', and in this way, also add error bars to the chart. Except for the standard deviation, the mean can be computed together with the standard error of the mean, confidence intervals, or a range of all values. The bar plot can also represent the median with an interquartile range, Q1 and Q3, a mean absolute deviation (mad), and a range of values.

# Adding error bars to the plot:
# Mean + standard deviation:
ggbarplot(..., add = "mean_sd") 

# Mean + standard error of the mean:
ggbarplot(..., add = "mean_se") 

# Mean + 95% confidence intervals for the mean:
ggbarplot(..., add = "mean_ci") 

# Mean + range
ggbarplot(..., add = "mean_range") 

# Plotting with error bars
# PLOT A/ Plotting a column from a wide data frame - single lipid:
ggbarplot(data = data,         
          x = "Label", 
          y = "SM 41:1;O2", 
          fill = "Label", 
          add = 'mean_sd')
          
# PLOT B/ Plotting a column from a long data frame - multiple lipids:
data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  ggbarplot(x = 'Lipids', 
            y = 'Concentrations', 
            fill = "Label", 
            add = 'mean_sd',
            position = position_dodge(width = 0.8))

The output:

As mentioned above, we can depict also the median of concentrations with a corresponding measure of spread:

# Using a bar plot to present the median with a measure of spread:
# Median + interquartile range:
ggbarplot(..., add = "median_iqr") 

# Median + Q1 and Q3:
ggbarplot(..., add = "median_q1q3") 

# Median + mean absolute deviation:
ggbarplot(..., add = "median_mad") 

# Median + range
ggbarplot(..., add = "median_range") 

# EXAMPLES
# PLOT A/
ggbarplot(data = data,         
          x = "Label", 
          y = "SM 41:1;O2", 
          fill = "Label", 
          add = 'median_iqr')
          
# PLOT B/
data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  ggbarplot(x = 'Lipids', 
            y = 'Concentrations', 
            fill = "Label", 
            add = 'median_iqr',
            position = position_dodge(width = 0.8))

The output:

Preparing bar charts via tidyplots (level: basic)

A completely new R plotting solution has emerged recently (preprint released in November 2024), i.e., the R tidyplots library (https://tidyplots.org/). Its design and principles are based on tidyverse assumptions. Using tidy data frames, one can quickly create publication-ready plots via simple and instinctive tidyplots' functions. Here you will find more about using tidyplots in lipidomics and metabolomics. Let's begin by creating bar charts based on our example: PDAC data set.

# Installation of tidyplots library:
install.packages("tidyplots")

# Activate library:
library(tidyplots)

# Additionally, we will activate tidyverse:
library(tidyverse)

# Creating a simple bar chart for SM 41:1;O2.
## We use the PDAC lipidomics data set read into R as `data` (see global environment).

# Bar chart:
data %>%
  tidyplot(x = Label, y = `SM 41:1;O2`) %>%
  add_mean_bar() 
  
# Explanations:
# We pipe the `data` data frame to tidyplot() function to create a new tidyplot.
# We indicate mappings for x- and y-axis, here x = `Label` (group), y = SM 41:1;O2 concentrations.
# NOTE!
# Tidyplots automatically compute mean concentrations across T, N, and PAN.
# Finally, we create a mean bar chart layer within our tidyplot object through add_mean_bar().

We obtain the following simple bar chart:

This basic bar chart can be easily customized to convey more information about the collected data, such as by overlaying a dot plot or adding error bars representing the standard deviation or standard error of the mean. Additionally, the plot's fill can be adjusted to reflect biological or experimental groups. Refer to the code blocks below for implementation:

# Adding SD or SEM error bars to the bar chart:
# Standard deviation error bar (Plot 1):
data %>%
  tidyplot(x = Label, y = `SM 41:1;O2`) %>%
  add_mean_bar(alpha = 0.3) %>%
  add_sd_errorbar()

# Standard error of mean error bar (Plot 2):
data %>%
  tidyplot(x = Label, y = `SM 41:1;O2`) %>%
  add_mean_bar(alpha = 0.3) %>%
  add_sem_errorbar()
  
# Note! 
# The error bars have the same color as bar charts.
# For clarity, we change the transparency of bar charts to 0.3.
# The tidyplot concept is straightforward:
## Need another plot layer?
## Find appropriate add_...() function.

We obtain the following plots:

To make bar charts more informative, we additionally co-plot it with dot plots:

# Overlapping bar charts with dot plots:
data %>%
  tidyplot(x = Label, y = `SM 41:1;O2`) %>%
  add_mean_bar(alpha = 0.3) %>%
  add_sd_errorbar() %>%
  add_data_points_jitter(alpha = 0.2)
  
# Note!
# Dot plots can be easily incorporated through another add_...() function.
# Here we use add_data_points_jitter().
# We correct jitter's alpha to 0.2 for clarity.

The following plot is produced:

Finally, we can adjust the plot’s filling and descriptions to improve its visual appeal.

First, we must determine the grouping variable to which the filling will correspond. In our case, this information is stored in the Labelcolumn:

# Changing plot filling:
data %>%
  tidyplot(x = Label, y = `SM 41:1;O2`, color = Label) %>% 
  add_mean_bar(alpha = 0.3) %>%
  add_sd_errorbar() %>%
  add_data_points_jitter(alpha = 0.2)
  
# Note! 
# We indicate the grouping variable in tidyplot().

After the code modification, a library, at first, will select its filling. Also, a legend appears:

Now, through the family of adjust_...() functions, we can select the filling of our preference:

# Adjusting the bar chart filling:
data %>%
  tidyplot(x = Label, y = `SM 41:1;O2`, color = Label) %>%
  add_mean_bar(alpha = 0.3) %>%
  add_sd_errorbar() %>%
  add_data_points_jitter(alpha = 0.2) %>%
  adjust_colors(new_colors = c("royalblue", "orange", "red2"))

# Use adjust_colors() to select new filling.
# Deliver the new colors as a vector through the new_colors argument.
# You can use color names or color codes.

The updated output:

Furthermore, using adjust_...() functions, we change the plot descriptions, including both axes, plot title, caption, etc.:

# Modify plot descriptions:
data %>%
  tidyplot(x = Label, y = `SM 41:1;O2`, color = Label) %>%
  add_mean_bar(alpha = 0.3) %>%
  add_sd_errorbar() %>%
  add_data_points_jitter(alpha = 0.2) %>%
  adjust_colors(new_colors = c("royalblue", "orange", "red2")) %>%    
  adjust_title("SM 41:1;O2 concentrations in serum of N, PAN, and T") %>%
  adjust_x_axis_title("Health status") %>%
  adjust_y_axis_title("Concentration of SM 41:1;O2") %>%
  adjust_legend_title("Experimental group") %>%
  adjust_caption("Serum lipid concentrations measured via SFC/MS")
  
# Note!
# All plot labels can be easily changed through the appropriate adjust_...() function!

The bar chart with title, modified axes and legend descriptions, and a caption added:

You can also easily rename x-axis labels or change the plot size:

# Renaming x-axis labels + adjusting plot size:
data %>%
  tidyplot(x = Label, y = `SM 41:1;O2`, color = Label) %>%
  add_mean_bar(alpha = 0.3) %>%
  add_sd_errorbar() %>%
  add_data_points_jitter(alpha = 0.2) %>%
  adjust_colors(new_colors = c("royalblue", "orange", "red2")) %>%    
  adjust_title("SM 41:1;O2 concentrations in serum of N, PAN, and T") %>%
  adjust_x_axis_title("Health status") %>%
  adjust_y_axis_title("Concentration of SM 41:1;O2") %>%
  adjust_legend_title("Experimental group") %>%
  adjust_caption("Serum lipid concentrations measured via SFC/MS") %>%
  rename_x_axis_labels(new_names = c("N" = "Ctrl",
                                     "PAN" = "Ctrl - PAN",
                                     "T" = "PDAC")) %>%
  adjust_size(width = 40, height = 60)

The output:

The tidyplot library also allows for rearranging grouping variables (which R sorts alphabetically by default) or specific sorting of labels:

# Re-arranging x-axis labels:
## Defined order:
data %>%
  tidyplot(x = Label, y = `SM 41:1;O2`, color = Label) %>%
  add_mean_bar(alpha = 0.3) %>%
  add_sd_errorbar() %>%
  add_data_points_jitter(alpha = 0.2) %>%
  adjust_colors(new_colors = c("royalblue", "orange", "red2")) %>%    
  adjust_title("SM 41:1;O2 concentrations in serum of N, PAN, and T") %>%
  adjust_x_axis_title("Health status") %>%
  adjust_y_axis_title("Concentration of SM 41:1;O2") %>%
  adjust_legend_title("Experimental group") %>%
  adjust_caption("Serum lipid concentrations measured via SFC/MS") %>%
  rename_x_axis_labels(new_names = c("N" = "Ctrl",
                                     "PAN" = "Ctrl - PAN",
                                     "T" = "PDAC")) %>%
  adjust_size(width = 40, height = 60) %>%
  reorder_x_axis_labels("PDAC", "Ctrl - PAN", "Ctrl")
  
## Reverse order
data %>%
  tidyplot(x = Label, y = `SM 41:1;O2`, color = Label) %>%
  add_mean_bar(alpha = 0.3) %>%
  add_sd_errorbar() %>%
  add_data_points_jitter(alpha = 0.2) %>%
  adjust_colors(new_colors = c("royalblue", "orange", "red2")) %>%    
  adjust_title("SM 41:1;O2 concentrations in serum of N, PAN, and T") %>%
  adjust_x_axis_title("Health status") %>%
  adjust_y_axis_title("Concentration of SM 41:1;O2") %>%
  adjust_legend_title("Experimental group") %>%
  adjust_caption("Serum lipid concentrations measured via SFC/MS") %>%
  rename_x_axis_labels(new_names = c("N" = "Ctrl",
                                     "PAN" = "Ctrl - PAN",
                                     "T" = "PDAC")) %>%
  adjust_size(width = 40, height = 60) %>%
  reverse_x_axis_labels()
  
  # Note! 
  # Both block codes lead to the same plot!

Updated x-axis labels:

Finally, instead of a wide data frame, we can use a long tibble to plot several lipids next to each other:

# Step 1: Select data for plotting and create a long tibble:
data.long <- 
  data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations')
               
# From wide `data` tibble, we select columns with a factor and lipids of interest.
# Using pivot_longer(), a long tibble is created.
# The output is saved as data.long

# Creating tidyplots bar chart:
data.long %>%
  tidyplot(x = Lipids, y = Concentrations, color = Label) %>%
  add_mean_bar(alpha = 0.3) %>%
  add_sd_errorbar() %>%
  add_data_points_jitter(alpha = 0.2) %>%
  adjust_colors(new_colors = c("royalblue", "orange", "red2")) %>%    
  adjust_title("SM 41:1;O2 concentrations in serum of N, PAN, and T") %>%
  adjust_x_axis_title("Health status") %>%
  adjust_y_axis_title("Concentration of SM 41:1;O2") %>%
  adjust_legend_title("Experimental group") %>%
  adjust_caption("Serum lipid concentrations measured via SFC/MS") %>%
  adjust_size(width = 100, height = 60)
  
# Remember to use the correct data frame (now, data.long).
# Remember to provide new, correct column names!
# We also adjust the plot size to avoid overlapping bar charts.
# The rest of the code is the same.

The final bar chart:

More about tidyplots you can also find here:

Preparing bar charts via ggplot2 (level: intermediate)

In ggplot2, first, before constructing any type of plot, we need to specify so-called aesthetic mappings. The aesthetic mappings contain information for a ggplot about what columns from your data frame will affect what elements of your plot. In other words, what variables we will be mapped to the x and y aesthetics. The aesthetic is defined via aes():

# Calling the tidyverse collection:
library(tidyverse)

# Defining x, y-aesthetics:
ggplot(data, aes(x = ..., y = ...)) +
  ...
  
# For example:
ggplot(data, aes(x = Label, y = `SM 41:1;O2`)) +
  ...

At this step, there are no layers defined, so there is nothing to plot. You've just decided about what variables will be mapped to x- and what to the y-aesthetic. Except for the aesthetic mappings, the ggplot() function will expect from you a data frame - a source of data. To obtain any plot, we need to create layers. To do so, we need to use one of the geom_... functions. As we want to build a bar plot representing values in the data, let's add geom_col() to our aesthetics:

# Adding layers to aesthetic mappings:
ggplot(data, aes(x = Label, y = `SM 41:1;O2`)) +
  geom_col()

Important: ggplot2 contains also geom_bar() - geom_bar() is used if you want to represent numbers of cases in each group (not values in the data)!

In the effect, we obtain this chart:

Although we obtained a plot, we realize there are several issues:

1) the bar chart does not look impressive,

2) our bar chart does not present mean values.

Unfortunately, using ggplot2, we will need to compute the statistics we would like to plot. We also need to use a couple of more commands to improve the way the plot looks.

The summary statistics can be obtained in several different ways. Here, we will compute summary statistics via the summarise() function from the dplyr package, and using the tibble with summary statistics - we will create a plot. The code is presented below:

# First, we compute summary statistics for SM 41:1;O2 - mean + sd:
ds.SM.41.1 <-
  data %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  group_by(Label) %>%
  summarize(mean = mean(`SM 41:1;O2`),
            sd = sd(`SM 41:1;O2`))
            
# Explanations:
# 1. Take 'data' from the global environment, 
# 2. Pipe it to select(),
# 3. Select `Label` and `SM 41:1;O2` columns,
# 4. Pipe these columns to group_by(),
# 5. Group all entries by `Label` - biological group,
# 6. Pipe it to summarise(),
# 7. Using summarise() calculate mean and sd for `SM 41:1;O2`,
# 8. Store the results in a tibble called 'ds.SM.41.1`.

We obtain the following tibble:

# Plotting descriptive statistics for `SM 41:1;O2`:
ggplot(ds.SM.41.1, aes(x = Label, y = mean)) +
  geom_col()
  
# or in one pipeline:
data %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  group_by(Label) %>%
  summarize(mean = mean(`SM 41:1;O2`),
            sd = sd(`SM 41:1;O2`)) %>%
  ggplot(aes(x = Label, y = mean)) +
  geom_col()

In the effect, we obtain:

The plot presents now a part of the descriptive statistics of interest. We need to add error bars in the next step, and we will use geom_errorbar() for this purpose. The geom_errorbar() will require new aesthetics (ymin and ymax), corresponding to the standard deviations we computed. See the code below:

# Adding error bars to our bar chart:
ggplot(ds.SM.41.1, aes(x = Label, y = mean)) +
  geom_col() +
  geom_errorbar(aes(ymin = mean - sd, 
                    ymax = mean + sd))

The error bars will reach from mean to mean - sd (lower error bar), and from mean to mean + sd (upper error bar). We obtain the following plot:

Now, our plot finally presents the descriptive statistics of interest. However, it still does not look interesting. Hence, we will perform basic customization to improve it:

We want to fill the bars with royal blue for N, orange for PAN, and red2 for T,
We will change the width of the bars to 0.5,
We will change the color of the bars' contour to black,
We will change the width of the error bars to 0.4,
And we want to remove the awful gray background with a simple theme_bw().

To achieve the first point, we need to add in our main aesthetic the argument fill, resulting in the data grouping by the biological groups. To fill the bars with selected colors we will need to add: scale_fill_manual():

# Basic customization of the bar chart
ggplot(ds.SM.41.1, aes(x = Label, y = mean, fill = Label)) +
  geom_col() +
  scale_fill_manual(values = c('royalblue', 'orange','red2')) +
  geom_errorbar(aes(ymin = mean - sd, 
                    ymax = mean + sd))

The order of colors in the scale_fill_manual() corresponds to the order of labels in the x-axis. We obtain the following plot:

The plot begins to look better. Steps 2 to 4 can be performed within geom_... functions which contain arguments that enable changing the width of bars, error bars, and the contour color of bars:

# Changing bars and error bars' width and bars' contour color:
ggplot(ds.SM.41.1, aes(x = Label, y = mean, fill = Label)) +
  geom_col(width = 0.5, color = 'black') +
  scale_fill_manual(values = c('royalblue', 'orange','red2')) +
  geom_errorbar(aes(ymin = mean - sd, 
                    ymax = mean + sd),
                width = 0.4)

The output:

Finally, we want to substitute the gray background with a simple theme_bw(). We just need to add this to our plot:

# Changing plot theme:
ggplot(ds.SM.41.1, aes(x = Label, y = mean, fill = Label)) +
  geom_col(width = 0.5, color = 'black')+
  scale_fill_manual(values = c('royalblue', 'orange','red2')) +
  geom_errorbar(aes(ymin = mean - sd, 
                    ymax = mean + sd),
                width = 0.4) +
  theme_bw()

And we obtain our final plot:

We will show you how to customize ggplot2 charts in the next subchapters. Here, we will only gently modify the layout.

What if we would like to plot data for more than one lipid? First, we need to compute descriptive statistics for all features of interest. We select the columns from the main data frame, change the wide data frame into a long one, and compute summary statistics for all concentrations grouped by lipid species and biological groups:

# Bar chart for more than one variables:
# Step 1: Computing summary statistics:
data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  group_by(Lipids, Label) %>%
  summarize(mean = mean(Concentrations),
            sd = sd(Concentrations))

We obtain the following tibble:

Now, that we have the data set with summary statistics, we can pipe it to the ggplot function directly. In the aes(), we define that on the x-axis, we want to see every selected lipid - Lipids, on our y-axis - the mean concentration mean, and we want the output to be grouped along the x-axis by Label - biological group. We can also immediately define that we want a bar plot - geom_col() with the same error bars as in the previous example (via geom_errorbar()):

data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  group_by(Lipids, Label) %>%
  summarize(mean = mean(Concentrations),
            sd = sd(Concentrations)) %>%
  ggplot(aes(x = Lipids, y = mean, fill = Label)) +
  geom_col()+
  geom_errorbar(aes(ymin = mean - sd, 
                    ymax = mean + sd))

If we run this code, we obtain this chart:

If we do not define the position argument in both geom_col() and geom_errobar() we obtain a stacked bar plot and our error bars will be aligned together. It is a similar situation to the ggpubr plot! We need to modify our code to place the bars side-by-side (dodge position):

data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  group_by(Lipids, Label) %>%
  summarize(mean = mean(Concentrations),
            sd = sd(Concentrations)) %>%
  ggplot(aes(x = Lipids, y = mean, fill = Label)) +
  geom_col(position = position_dodge(width = 0.9))+
  geom_errorbar(aes(ymin = mean - sd, 
                    ymax = mean + sd),
                position = position_dodge(width = 0.9))

After you hit run, you will obtain:

Now, we can apply the previous basic custom look to our plot (we will not change the bars' width only):

bar.chart.ggplot <-
data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  group_by(Lipids, Label) %>%
  summarize(mean = mean(Concentrations),
            sd = sd(Concentrations)) %>%
  ggplot(aes(x = Lipids, y = mean, fill = Label)) +
  geom_col(color = 'black', position = position_dodge(width = 0.9))+
  scale_fill_manual(values = c('royalblue', 'orange','red2')) +
  geom_errorbar(aes(ymin = mean - sd, 
                    ymax = mean + sd),
                position = position_dodge(width = 0.9),
                width = 0.4) +
  theme_bw()
  
# Printing the results:
print(bar.chart.ggplot)

We produce this nice bar plot:

It is saved in the global environment as "bar.chart.ggplot2".

If instead of mean values with standard deviations, one would like to show medians with interquartile ranges, the summary statistics must be recalculated:

# Computing median with IQR for a bar chart
data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  group_by(Lipids, Label) %>%
  summarize(median = median(Concentrations),
            IQR = IQR(Concentrations))

The plot aesthetics must be corrected accordingly, and a bar chart depicting medians with IQRs can be made.

As you can see, the ggplot2 solution requires more lines of code to obtain a basic bar chart when compared to the ggpubr. Hence, it is a slightly more complicated method.

The ggplot2 and ggpubr graphics can be exported from RStudio directly from the tab "Plots." However, we will show you a different solution, which is particularly handy for Jupyter Notebook users.

First, we need to install the ggimage library. You can learn more about it here:

We install the library:

# A direct export from RStudio may result in low-quality graphics.
install.packages("ggimage")

# Call library:
library(ggimage)

Now, we can prepare the graphics for export and generate a preview of the output in the "Plots" tab of RStudio:

# Generate a preview and optimize the size of fonts and other elements:
ggpreview(plot = bar.chart.ggplot,     # The R object for preview.
          width = 600,               # Width in px - if you select px as a unit.
          height = 400,              # Height in px - if you select px a unit.
          units = "px",              # Unit of size - px.
          dpi = 300,                 # Sharpness.
          scale = 2.5)            # You may need to use a different scale.

We obtain:

If the graphics are not distorted - we can save them now using the ggsave() function from the ggplot2 package (tidyverse collection). The application of this function is presented below:

# Save the plot using ggsave (ggplot2 package - tidyverse):
ggsave(plot = bar.chart.ggplot, # The ggplot2 or ggpubr object to be saved.
       path = "D:/...",           # Here, introduce the path where the plot should be saved. Defaults to the working directory, which should be set at the beginning.
       device = "jpeg",     # Select format, e.g. "jpeg", "pdf", "tiff", and others.
       filename = "Dendrogram from ggtree - scale2.jpeg", # Name of the new file in the wd.
       width = 800,    # Width in px - if you select px as a unit. Transfer it from the ggpreview().
       height = 600,   # Height in px - if you select px as a unit. Transfer it from the ggpreview().
       units = "px", # Unit of size - px. Transfer it from the ggpreview().
       dpi = 300, # Sharpness. Transfer it from the ggpreview().
       scale = 2.5) # Transfer it from the ggpreview().
       
# Finally, we set:
ggsave(plot = bar.chart.ggplot,
       path = "D:/Data analysis/", # You can also remove this argument - plot is saved in your wd.
       device = "jpeg",
       filename = "First_bar_chart_example.jpeg",
       width = 600,
       height = 400,
       units = "px",
       dpi = 300,
       scale = 2.5)

Our chart is saved in the wd as a jpeg:

Preparing bar charts via plotly (level: advanced)

As the most advanced solution, we selected a great package for creating graphics, including interactive graphics. It is called plotly. The plotly visualizations can certainly spice up a talk or lab presentation. You can read more about plotly here:

However, since you begin with R, you may find this solution more complicated than the rest. This is why we present it as the last one.

We will begin with the tibble 'ds.SM.41.1' computed in the previous section. To create a basic bar plot, we use the function plot_ly():

# Calling library
library(plotly)

# Plotting a basic bar chart:
plot_ly(ds.SM.41.1, 
        x = ~Label, 
        y = ~mean, 
        error_y = ~list(array = sd, color = 'black'), 
        type = 'bar',
        marker = list(color = c('royalblue', 'orange', 'red'))) %>%
  layout(title = 'Alterations in SM 41:1;O2 profile',
         yaxis = 
           list(title = 'Concentration [nmol/mL]', 
                showline= T, 
                linewidth = 1, 
                linecolor='black', 
                showticklabels = T),
         xaxis = 
           list(title = 'Status', 
                showline= T, 
                linewidth = 0.05, 
                linecolor='black'))

We obtain:

For the most basic bar chart - we indicate in the plot_ly() function what columns we would like to map to x- and y-axis (x = ~Label, y = ~ mean), and in what data frame is the data stored ('ds.SM.41.1'). We can also add within the same function the error bars using error_y argument. To error_y argument we deliver a list containing mapping information (array = sd), and the color of the error bars (color = 'black'). Next, we select a type of plot (bar chart - type = 'bar'), and we can define colors for all three bars through marker argument via a list with colors.

To adjust the plot layout - we pipe the output to the layout(), where we first add the main title of the chart (title ='Alterations in SM 41:1;O2 profile'), and then to yaxis and xaxis we deliver the list with all parameters, including titles of axes, whether to show the axis lines (showline = T), what should be the width of these lines (linewidth), colors (linecolor), and whether the tick labels should be shown.

If in your RStudio you hover with your cursor over the image of this plot, you will realize it is interactive and presents the mean with standard deviation for each bar:

Using plotly, we can also present bars corresponding to more than one lipid (more than one lipid shown in one plot). The new bars are added through add_bars() or add_trace(), and all expect a dedicated column with summary statistics. Therefore, we will change the way we compute our summary statistics. First, we will compute means and standard deviations for selected long-chain SM for every biological group separately:

# Computing summary statistics for N
ds.N <- data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  filter(Label == 'N') %>%
  group_by(Lipids) %>%
  summarize(mean.N = mean(Concentrations),
            sd.N = sd(Concentrations))
            
# Computing summary statistics for PAN
ds.PAN <- data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  filter(Label == 'PAN') %>%
  group_by(Lipids) %>%
  summarize(mean.PAN = mean(Concentrations),
            sd.PAN = sd(Concentrations))

# Computing summary statistics for T
ds.T <- data %>%
  select(`Label`, 
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  filter(Label == 'T') %>%
  group_by(Lipids) %>%
  summarize(mean.T = mean(Concentrations),
            sd.T = sd(Concentrations))
            
print(ds.N)
print(ds.PAN)
print(ds.T)

The output:

Then, using a column shared by all tibbles, we can merge them into one tibble, which can be then used for plotting in plotly. For merging 3 tibbles into one - we can apply left_join() function from the tidyverse collection (dplyr package). We will perform the merging in two steps, as left_join() connects two tibbles at once:

# For merging 3 tibbles into one:
ds.final <-
  left_join(ds.N, ds.PAN, by = 'Lipids') %>% 
  left_join(., ds.T, by = 'Lipids')

The output:

Now, using add_bars(), we add two more bars for every lipid for PAN and T groups. The layout was copied from the previous plot. We just added barmode = 'group' to obtain a grouped bar chart:

# Plotting bar chart for multiple lipid species via plotly:
plot_ly(ds.final, 
        x = ~Lipids, 
        y = ~mean.N, 
        type = 'bar', 
        color = I("royalblue"), 
        error_y = ~list(array = sd.N,
                        color = '#000000')) %>%
  add_bars(y = ~mean.PAN, 
           color = I("orange"), 
           error_y = ~list(array = sd.PAN,
                           color = '#000000')) %>%
  add_bars(y = ~mean.T, 
           color = I("red2"), 
           error_y = ~list(array = sd.T,
                           color = '#000000')) %>%
  layout(title = 'Alterations in SM 41:1;O2 profile',
         barmode = 'group',
         yaxis = 
           list(title = 'Concentration [nmol/mL]', 
                showline= T, 
                linewidth = 1, 
                linecolor='black', 
                showticklabels = T),
         xaxis = 
           list(title = 'Status', 
                showline= T, 
                linewidth = 0.05, 
                linecolor='black'))

The final output:

IMPORTANT: If you want to keep the plot interactive, save it in RStudio as a .html file.

PreviousBasic plotting in R NextBox plots

Last updated 3 months ago