Omics data visualization in R and Python

Correlation heat maps

Metabolites and lipids descriptive statistical analysis in R

PreviousDot plots with ggplot2 and tidyplots NextCustomizing ggpubr and ggplot2 charts in R

Last updated 3 months ago

Correlation heat maps

Metabolites and lipids descriptive statistical analysis in R

Correlation heat maps are nothing else but large plots that display correlations between variables from a data set. These plots are frequently used in manuscripts to depict relationships between lipid or metabolite concentrations as well as their associations with clinical variables. These relationships can be further explored in the context of lipid (metabolite) metabolism, especially when they change in response to disease progression, recovery, etc.

For practical examples of correlation heat maps, refer to the following papers:

A. Jeucken et al. A Comprehensive Functional Characterization of Escherichia coli Lipid Genes. DOI: - Fig. 5 (a study in Cell Reports utilizing lipid-lipid correlation networks, i.e., exploring statistical relationships between 100 most abundant lipid species, and analysis of these relationships in the context of lipid metabolism in bacterium).
K. Huynh et al. Concordant peripheral lipidome signatures in two large clinical studies of Alzheimer’s disease. DOI: - Fig. 1C (a study on peripheral lipidome Alzheimer's signatures published in Nature Communications, presenting statistical relationships (Spearman correlation) between total lipid classes, subclasses, and commonly reported clinical measures).
B. Peng et al. Identification of key lipids critical for platelet activation by comprehensive analysis of the platelet lipidome. DOI: - Fig. 4D (absolute quantification of platelet lipidome published in Blood; authors used correlation heat map with hierarchical clustering for 384 quantified lipid species; 12 distinct clusters of correlated and anticorrelated lipids were identified during platelet activation (Pearson correlation ≥0.85)).
Y. Ding et al. Comprehensive metabolomics profiling reveals common metabolic alterations underlying the four major non-communicable diseases in treated HIV infection. DOI: - Fig. 5 (the authors of the study published in eBioMedicine (a part of the Lancet Discovery Science journals) presented in the form of a correlation heat map statistical relationships (Spearman correlation) between eigenmetabolite, altered metabolites, classical lipids, and clinical parameters in all participants).
L. Ottensmann et al. Genome-wide association analysis of plasma lipidome identifies 495 genetic associations. DOI: - Fig. 1b (the authors of a manuscript published in Nature Communications used a correlation heatmap for presenting the absolute pairwise Pearson correlations between the lipid species included in the 11 clusters of the multivariate genome-wide association studies).
M. Lange et al. AdipoAtlas: A reference lipidome for human white adipose tissue. DOI: - Fig. 4C (the authors use correlation heat maps for presenting Pearson's correlations of significantly regulated lipids between lean and obese WAT).

Preparing correlation heatmaps via DataExplorer

The correlation heat map can be obtained through the plot_correlation() from the DataExplorer package:

# Calling library:
library(DataExplorer)
library(tidyverse) # We will certainly need tidyverse tools.

# Filtering all N and T:
data.N <-
  data %>%
  filter(Label == "N")

data.T <-
  data %>%
  filter(Label == "T")

# Consulting the documentation:
?plot_correlation()

# Preparing correlation heat maps.
# For all controls (N):
data.N %>%
  select(-`Sample Name`,
         - `Label`) %>% 
  plot_correlation()

plot_correlation(data.N)

# For all patients with PDAC (T):
data.T %>%
  select(-`Sample Name`,
         - `Label`) %>% 
  plot_correlation()

The heat maps:

Preparing correlation heatmaps via ggcorrplot

The ggcorrplot package produces a ggplot2 visualization of the correlation matrix. Read more about the package here:

In the first step, we need to compute this matrix, and in the next step, a visualization is obtained:

# Installing the ggcorrplot library:
install.packages("ggcorrplot")

# Calling library:
library(ggcorrplot)

# Reading the function's documentation:
?ggcorrplot()

# Computing a correlation matrix (cor() function):
cor.mat <-
  data %>%
  filter(Label == "N") %>% 
  select(-`Sample Name`,
         - `Label`) %>%
  cor()

# Computing matrix of p-values:
cor.mat.pval <-
  data %>%
  filter(Label == "N") %>% 
  select(-`Sample Name`,
         - `Label`) %>%
  cor_pmat()
  
# Creating correlation heat maps:
# PLOT A:
ggcorrplot(cor.mat)

# PLOT B:
data %>%
  filter(Label == "N") %>% 
  select(starts_with("CE") | starts_with("SM") | starts_with("LPC")) %>%
  cor() %>%
  ggcorrplot(method = "circle", 
           type = "lower",
           lab_size = 4,
           outline.color = "black")

Plot A:

Plot B:

Preparing correlation heatmaps via corrplot library

The principle is the same as for ggcorrplot library, first, compute the matrix of correlations, then - visualize it. Read more about the possibilities offered by the corrplot library here:

# Installing library:
install.packages("corrplot")

# Calling library
library(corrplot)

# Reading about the function of interest:
?corrplot()

# Exemplary plot:
data %>%
  filter(Label == "N") %>% 
  select(-`Sample Name`,
         -`Label`) %>%
  cor() %>%
  corrplot(method = 'color',
         tl.col = "black",
         tl.cex = 0.5,
         col=colorRampPalette(c("#00e8f0",
                                "#0d78ca", 
                                "#002060",
                                "white",
                                "#600000", 
                                "red", 
                                "#FF4D4D"))(200))

As you see, we immediately customized the color of scales (-1 to 1). By changing the color codes in colorRampPalette, you can create your palette of colors for the heat map. The correlation heat map obtained from the code above looks like this:

IMPORTANT: The correlation heat map produced by corrplot has a white background around it. You can crop it in any of the freely available graphical software (the simplest method).

PreviousDot plots with ggplot2 and tidyplots NextCustomizing ggpubr and ggplot2 charts in R

Last updated 3 months ago

For practical examples of correlation heat maps, refer to the following papers:

A. Jeucken et al. A Comprehensive Functional Characterization of Escherichia coli Lipid Genes. DOI: - Fig. 5 (a study in Cell Reports utilizing lipid-lipid correlation networks, i.e., exploring statistical relationships between 100 most abundant lipid species, and analysis of these relationships in the context of lipid metabolism in bacterium).
K. Huynh et al. Concordant peripheral lipidome signatures in two large clinical studies of Alzheimer’s disease. DOI: - Fig. 1C (a study on peripheral lipidome Alzheimer's signatures published in Nature Communications, presenting statistical relationships (Spearman correlation) between total lipid classes, subclasses, and commonly reported clinical measures).
B. Peng et al. Identification of key lipids critical for platelet activation by comprehensive analysis of the platelet lipidome. DOI: - Fig. 4D (absolute quantification of platelet lipidome published in Blood; authors used correlation heat map with hierarchical clustering for 384 quantified lipid species; 12 distinct clusters of correlated and anticorrelated lipids were identified during platelet activation (Pearson correlation ≥0.85)).
Y. Ding et al. Comprehensive metabolomics profiling reveals common metabolic alterations underlying the four major non-communicable diseases in treated HIV infection. DOI: - Fig. 5 (the authors of the study published in eBioMedicine (a part of the Lancet Discovery Science journals) presented in the form of a correlation heat map statistical relationships (Spearman correlation) between eigenmetabolite, altered metabolites, classical lipids, and clinical parameters in all participants).
L. Ottensmann et al. Genome-wide association analysis of plasma lipidome identifies 495 genetic associations. DOI: - Fig. 1b (the authors of a manuscript published in Nature Communications used a correlation heatmap for presenting the absolute pairwise Pearson correlations between the lipid species included in the 11 clusters of the multivariate genome-wide association studies).
M. Lange et al. AdipoAtlas: A reference lipidome for human white adipose tissue. DOI: - Fig. 4C (the authors use correlation heat maps for presenting Pearson's correlations of significantly regulated lipids between lean and obese WAT).

Preparing correlation heatmaps via DataExplorer

The correlation heat map can be obtained through the plot_correlation() from the DataExplorer package:

# Calling library:
library(DataExplorer)
library(tidyverse) # We will certainly need tidyverse tools.

# Filtering all N and T:
data.N <-
  data %>%
  filter(Label == "N")

data.T <-
  data %>%
  filter(Label == "T")

# Consulting the documentation:
?plot_correlation()

# Preparing correlation heat maps.
# For all controls (N):
data.N %>%
  select(-`Sample Name`,
         - `Label`) %>% 
  plot_correlation()

plot_correlation(data.N)

# For all patients with PDAC (T):
data.T %>%
  select(-`Sample Name`,
         - `Label`) %>% 
  plot_correlation()

The heat maps:

Preparing correlation heatmaps via ggcorrplot

The ggcorrplot package produces a ggplot2 visualization of the correlation matrix. Read more about the package here:

In the first step, we need to compute this matrix, and in the next step, a visualization is obtained:

# Installing the ggcorrplot library:
install.packages("ggcorrplot")

# Calling library:
library(ggcorrplot)

# Reading the function's documentation:
?ggcorrplot()

# Computing a correlation matrix (cor() function):
cor.mat <-
  data %>%
  filter(Label == "N") %>% 
  select(-`Sample Name`,
         - `Label`) %>%
  cor()

# Computing matrix of p-values:
cor.mat.pval <-
  data %>%
  filter(Label == "N") %>% 
  select(-`Sample Name`,
         - `Label`) %>%
  cor_pmat()
  
# Creating correlation heat maps:
# PLOT A:
ggcorrplot(cor.mat)

# PLOT B:
data %>%
  filter(Label == "N") %>% 
  select(starts_with("CE") | starts_with("SM") | starts_with("LPC")) %>%
  cor() %>%
  ggcorrplot(method = "circle", 
           type = "lower",
           lab_size = 4,
           outline.color = "black")

Plot A:

Plot B:

Preparing correlation heatmaps via corrplot library

The principle is the same as for ggcorrplot library, first, compute the matrix of correlations, then - visualize it. Read more about the possibilities offered by the corrplot library here:

# Installing library:
install.packages("corrplot")

# Calling library
library(corrplot)

# Reading about the function of interest:
?corrplot()

# Exemplary plot:
data %>%
  filter(Label == "N") %>% 
  select(-`Sample Name`,
         -`Label`) %>%
  cor() %>%
  corrplot(method = 'color',
         tl.col = "black",
         tl.cex = 0.5,
         col=colorRampPalette(c("#00e8f0",
                                "#0d78ca", 
                                "#002060",
                                "white",
                                "#600000", 
                                "red", 
                                "#FF4D4D"))(200))

IMPORTANT: The correlation heat map produced by corrplot has a white background around it. You can crop it in any of the freely available graphical software (the simplest method).