Density plots

Metabolites and lipids descriptive statistical analysis in R

The density plot was already presented in the previous subchapter. Density plots are used to depict the distribution of a numeric variable. Every density plot can be viewed as a derivative of a histogram. Therefore, its applications are similar to those of a histogram. Density plots can be an effective way of visualizing concentration distributions of lipids or metabolites in biological materials. Check out the selected example published in Nature:

  • T. Takeuchi et al. Gut microbial carbohydrate metabolism contributes to insulin resistance. DOI: https://doi.org/10.1038/s41586-023-06466-x - Fig. 1d & 2b, f (the authors of a manuscript published in Nature used density plots, e.g., for presenting and comparing fecal levels of monosaccharides across experimental groups (as decimal logarithm), or HOMA-IR, BMI, triglycerides (TG) and HDL-C levels among the participant clusters).

Preparing density plots via DataExplorer (level: basic)

DataExplorer enables producing a simple density plot through plot_density():

# Calling library:
library(DataExplorer)

# Reading about plot_density:
?plot_density()

# Preparing plot - PLOT A
data %>%
  filter(Label == 'N') %>%
  select(`SM 41:1;O2`) %>%
  plot_density(theme_config = 
                 list(panel.background = element_blank(),
                      axis.line.x = element_line(color = 'black'),
                      axis.line.y = element_line(color = 'black')),
               geom_density_args = list(color = 'royalblue', linewidth = 1)
               ) 
               
# Preparing plot - PLOT B
data %>%
  filter(Label == 'T') %>%
  select(`SM 41:1;O2`) %>%
  plot_density(theme_config = 
                 list(panel.background = element_blank(),
                      axis.line.x = element_line(color = 'black'),
                      axis.line.y = element_line(color = 'black')),
               geom_density_args = list(color = 'red2', linewidth = 1)
               )

Basic plot customization was performed using theme_config argument. We delivered a list with parameters to be changed, e.g., we removed the classic ggplot2 gray background using panel.backgroud and setting it element_blank():

plot density(theme_config = list(panel.background = element_blank())

Then, we changed the x-y axes color to black through axis.line.x and axis.line.y:

# Customizing the theme:
plot_density(theme_config = 
                 list(panel.background = element_blank(),
                      axis.line.x = element_line(color = 'black'),
                      axis.line.y = element_line(color = 'black'))

Finally, we changed the colors of the density plot curves to 'royalblue or 'red2', and linewidth to 1, using geom_density_args and a list containing information about a color and linewidth:

# Changing color of density plot curves:
plot_density(geom_density_args = list(color = 'red2', linewidth = 1))

We obtain the following plots:

Such simple plots can be used for a quick data inspection.

Preparing density plots via ggpubr (level: intermediate)

The ggpubr package contains function ggdensity(). We will show you how to prepare a density plot for a single lipid and multiple lipids:

# Calling library
library(ggpubr)

# Reading the documentation about the function:
?ggdensity()

# PLOT A: Density plots for a single lipid (through wide tibble):
data %>%
  filter(Label != 'PAN') %>%
  select(`Label`,`SM 41:1;O2`) %>%
  ggdensity(x = "SM 41:1;O2",  # As x - you put name of the selected column
              color = "Label",
              fill = "Label",
              rug = T,
              palette = c('royalblue', 'red2'))
              
# PLOT B: Density plots for multiple lipids (through long tibble):
data %>%
  filter(Label != 'PAN') %>%
  select(`Label`,
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  ggdensity(x = "Concentrations",  # As x - you put column containing concentrations.
              color = "Label",
              fill = "Label",
              rug = T,
              palette = c('royalblue', 'red2')) +
  facet_grid(. ~ Lipids, scales = "free_x")

The plots:

Preparing density plots via ggplot2 (level: advanced)

As you know from the previous example with histograms, we can add a layer with a density plot using geom_density(). Take a look at the examples below - for a single lipid and multiple lipids. To customize the plot a bit more, we added mean values to them:

# PLOT A: Plot for a single lipid:
# We need to compute means:
means <-
  data %>%
  filter(Label != 'PAN') %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  group_by(Label) %>%
  summarise(mean = mean(`SM 41:1;O2`))

# PLOT A:
data %>%
  filter(Label != 'PAN') %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  ggplot(aes(x = `SM 41:1;O2`, 
           color = `Label`,
           fill = `Label`)) + 
  geom_density(alpha = 0.5) +
  geom_vline(xintercept = means$mean, 
             colour = c('darkblue', 'red2'), 
             linetype = 'dashed',
             linewidth = 1) +
  scale_color_manual(values = c('royalblue', 'red2')) +
  scale_fill_manual(values = c('royalblue', 'red2')) +
  theme_bw() 
  
# PLOT B: Multiple lipids:
# Computing mean values for vlines:
means <- 
  data %>%
  filter(`Label` != "PAN") %>%
  droplevels() %>%
  select(`Label`,
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Label, Lipids) %>%
  summarise(mean = mean(Concentrations))

# PLOT B:
data %>%
  filter(`Label` != "PAN") %>%
  droplevels() %>%
  select(`Label`,
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  ggplot(aes(x = `Concentrations`, 
           color = `Label`,
           fill = `Label`)) + 
  geom_density(alpha = 0.5) +
  geom_vline(data = means,
             aes(xintercept = mean),
             linetype = 'dashed',
             linewidth = 1,
             colour = rep(c('darkblue', 'red2'),
                          each = 4)) +
  scale_color_manual(values = c('royalblue', 'red2')) +
  scale_fill_manual(values = c('royalblue', 'red2')) +
  theme_bw() +
  facet_grid(. ~ Lipids, scales = "free_x")

The output:

It is possible to plot one density graph up, while the other down. Such a chart is known as a mirror density chart. This plot was inspired by an excellent source of ideas for beautiful R charts - R graph gallery:

# Preparing data and computing mean values:
data.N <- 
  data %>%
  filter(Label == 'N') %>%
  select(`Label`,
           `SM 41:1;O2`)


mean.N <- 
  data %>%
  filter(Label == 'N') %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  summarise(mean = mean(`SM 41:1;O2`))


data.T <- 
  data %>%
  filter(Label == 'T') %>%
  select(`Label`,
         `SM 41:1;O2`)

mean.T <- 
  data %>%
  filter(Label == 'T') %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  summarise(mean = mean(`SM 41:1;O2`))

# The final plot:
ggplot(data.N, aes(x = `SM 41:1;O2`)) + 
    geom_density(aes(y = ..density..), 
                 fill = 'royalblue', 
                 alpha = 0.4) + 
    geom_label(aes(x = 20, 
                   y = 0.10, 
                   label = "Healthy volunteers (N)"), 
               color = "royalblue") +
    geom_segment(data = mean.N,
             aes(x = mean,
                 xend = mean,
                 y = +Inf,
               yend = 0),
             linetype = 'dashed',
             linewidth = 1,
             colour = 'darkblue') +
    geom_density(data = data.T, 
                 aes(x = `SM 41:1;O2`, y = -..density..), 
                 fill = 'red2', 
                 alpha = 0.4) +
    geom_label(aes(x = 20, 
                   y = -0.10, 
                   label = "Patients with PDAC (T)"), 
               color = "red2") +
    geom_segment(data = mean.T,
               aes(x = mean,
                   xend = mean,
                   y = -Inf,
                   yend = 0),
               linetype = 'dashed',
               linewidth = 1,
               colour = 'red2')+
    theme_bw() 

The final plot:

We can modify the code to obtain mirror density plots for multiple lipids at once:

# Selecting data and computing means:
data.N <- 
  data %>%
  filter(Label == 'N') %>%
  select(`Label`,
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations')

mean.N <- 
  data %>%
  filter(Label == 'N') %>%
  select(`Label`,
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  group_by(Lipids, Label) %>%
  summarise(mean = mean(`Concentrations`))

data.T <- 
  data %>%
  filter(Label == 'T') %>%
  select(`Label`,
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations')

mean.T <- 
  data %>%
  filter(Label == 'T') %>%
  select(`Label`,
         `SM 39:1;O2`,
         `SM 40:1;O2`,
         `SM 41:1;O2`,
         `SM 42:1;O2`) %>%
  pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
               names_to = 'Lipids',
               values_to = 'Concentrations') %>%
  group_by(Lipids, Label) %>%
  summarise(mean = mean(`Concentrations`))
  
# Creating plot (without annotations):
ggplot(data.N, aes(x = `Concentrations`)) + 
  geom_density(aes(y = ..density..), 
               fill = 'royalblue', 
               alpha = 0.4) + 
  geom_segment(data = mean.N,
               aes(x = mean,
                   xend = mean,
                   y = +Inf,
                   yend = 0),
               linetype = 'dashed',
               linewidth = 1,
               colour = 'darkblue') +
  geom_density(data = data.T, 
               aes(x = `Concentrations`, y = -..density..), 
               fill = 'red2', 
               alpha = 0.4) +
  geom_segment(data = mean.T,
               aes(x = mean,
                   xend = mean,
                   y = -Inf,
                   yend = 0),
               linetype = 'dashed',
               linewidth = 1,
               colour = 'red2')+
  theme_bw() +
  facet_grid(. ~ Lipids, scales = 'free_x') 

We obtain the following plot:

We can also add annotations to each plot. Annotations must be delivered through geom_text() as a data frame with labels and coordinates if multiple panels are created through facet_grid(). Here, we will use exemplary annotations: "Control - N" and "Patient - T" :

# Creating data frame with exemplary annotations:
annotations <- 
  data.frame(
    label = rep(c("Controls - N", "Patients - T"), each = 4),
    Lipids = c("SM 39:1;O2", "SM 40:1;O2", "SM 41:1;O2", "SM 42:1;O2"),
    x = c(9, 45, 20, 25, 9, 40, 20, 20),
    y = c(0.2, 0.2, 0.2, 0.2, -0.2, -0.2,-0.2,-0.2))
    
# Adding annotations via geom_text():
ggplot(data.N, aes(x = `Concentrations`)) + 
  geom_density(aes(y = ..density..), 
               fill = 'royalblue', 
               alpha = 0.4) + 
  geom_segment(data = mean.N,
               aes(x = mean,
                   xend = mean,
                   y = +Inf,
                   yend = 0),
               linetype = 'dashed',
               linewidth = 1,
               colour = 'darkblue') +
  geom_density(data = data.T, 
               aes(x = `Concentrations`, y = -..density..), 
               fill = 'red2', 
               alpha = 0.4) +
  geom_segment(data = mean.T,
               aes(x = mean,
                   xend = mean,
                   y = -Inf,
                   yend = 0),
               linetype = 'dashed',
               linewidth = 1,
               colour = 'red2')+
  theme_bw() +
  facet_grid(. ~ Lipids, scales = 'free_x') +
  geom_text(data = annotations,
            aes(x = x, 
                y = y, 
                label = label, 
                color = label), 
            show.legend = F) + 
  scale_color_manual(values = (c('royalblue', 'red2')))

We obtain finally:

As annotations, one could add, for example, some additional clinical data, the exact value of mean and median concentration, etc.

Last updated