Metabolites and lipids descriptive statistical analysis in R
The density plot was already presented in the previous subchapter. Density plots are used to depict the distribution of a numeric variable. Every density plot can be viewed as a derivative of a histogram. Therefore, its applications are similar to those of a histogram. Density plots can be an effective way of visualizing concentration distributions of lipids or metabolites in biological materials. Check out the selected example published in Nature:
T. Takeuchi et al. Gut microbial carbohydrate metabolism contributes to insulin resistance. DOI: https://doi.org/10.1038/s41586-023-06466-x - Fig. 1d & 2b, f (the authors of a manuscript published in Nature used density plots, e.g., for presenting and comparing fecal levels of monosaccharides across experimental groups (as decimal logarithm), or HOMA-IR, BMI, triglycerides (TG) and HDL-C levels among the participant clusters).
Preparing density plots via DataExplorer (level: basic)
DataExplorer enables producing a simple density plot through plot_density():
Basic plot customization was performed using theme_config argument. We delivered a list with parameters to be changed, e.g., we removed the classic ggplot2 gray background using panel.backgroud and setting it element_blank():
Finally, we changed the colors of the density plot curves to 'royalblue or 'red2', and linewidth to 1, using geom_density_args and a list containing information about a color and linewidth:
# Changing color of density plot curves:
plot_density(geom_density_args = list(color = 'red2', linewidth = 1))
We obtain the following plots:
Such simple plots can be used for a quick data inspection.
Preparing density plots via ggpubr (level: intermediate)
The ggpubr package contains function ggdensity(). We will show you how to prepare a density plot for a single lipid and multiple lipids:
# Calling library
library(ggpubr)
# Reading the documentation about the function:
?ggdensity()
# PLOT A: Density plots for a single lipid (through wide tibble):
data %>%
filter(Label != 'PAN') %>%
select(`Label`,`SM 41:1;O2`) %>%
ggdensity(x = "SM 41:1;O2", # As x - you put name of the selected column
color = "Label",
fill = "Label",
rug = T,
palette = c('royalblue', 'red2'))
# PLOT B: Density plots for multiple lipids (through long tibble):
data %>%
filter(Label != 'PAN') %>%
select(`Label`,
`SM 39:1;O2`,
`SM 40:1;O2`,
`SM 41:1;O2`,
`SM 42:1;O2`) %>%
pivot_longer(cols = `SM 39:1;O2`:`SM 42:1;O2`,
names_to = 'Lipids',
values_to = 'Concentrations') %>%
ggdensity(x = "Concentrations", # As x - you put column containing concentrations.
color = "Label",
fill = "Label",
rug = T,
palette = c('royalblue', 'red2')) +
facet_grid(. ~ Lipids, scales = "free_x")
The plots:
Preparing density plots via ggplot2 (level: advanced)
As you know from the previous example with histograms, we can add a layer with a density plot using geom_density(). Take a look at the examples below - for a single lipid and multiple lipids. To customize the plot a bit more, we added mean values to them:
It is possible to plot one density graph up, while the other down. Such a chart is known as a mirror density chart. This plot was inspired by an excellent source of ideas for beautiful R charts - R graph gallery:
# Preparing data and computing mean values:
data.N <-
data %>%
filter(Label == 'N') %>%
select(`Label`,
`SM 41:1;O2`)
mean.N <-
data %>%
filter(Label == 'N') %>%
select(`Label`,
`SM 41:1;O2`) %>%
summarise(mean = mean(`SM 41:1;O2`))
data.T <-
data %>%
filter(Label == 'T') %>%
select(`Label`,
`SM 41:1;O2`)
mean.T <-
data %>%
filter(Label == 'T') %>%
select(`Label`,
`SM 41:1;O2`) %>%
summarise(mean = mean(`SM 41:1;O2`))
# The final plot:
ggplot(data.N, aes(x = `SM 41:1;O2`)) +
geom_density(aes(y = ..density..),
fill = 'royalblue',
alpha = 0.4) +
geom_label(aes(x = 20,
y = 0.10,
label = "Healthy volunteers (N)"),
color = "royalblue") +
geom_segment(data = mean.N,
aes(x = mean,
xend = mean,
y = +Inf,
yend = 0),
linetype = 'dashed',
linewidth = 1,
colour = 'darkblue') +
geom_density(data = data.T,
aes(x = `SM 41:1;O2`, y = -..density..),
fill = 'red2',
alpha = 0.4) +
geom_label(aes(x = 20,
y = -0.10,
label = "Patients with PDAC (T)"),
color = "red2") +
geom_segment(data = mean.T,
aes(x = mean,
xend = mean,
y = -Inf,
yend = 0),
linetype = 'dashed',
linewidth = 1,
colour = 'red2')+
theme_bw()
The final plot:
We can modify the code to obtain mirror density plots for multiple lipids at once:
We can also add annotations to each plot. Annotations must be delivered through geom_text() as a data frame with labels and coordinates if multiple panels are created through facet_grid(). Here, we will use exemplary annotations: "Control - N" and "Patient - T" :