💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  • Preparing correlation heatmaps via DataExplorer
  • Preparing correlation heatmaps via ggcorrplot
  • Preparing correlation heatmaps via corrplot library
  1. Metabolites and lipids descriptive statistical analysis in R
  2. Basic plotting in R

Correlation heat maps

Metabolites and lipids descriptive statistical analysis in R

PreviousDot plots with ggplot2 and tidyplotsNextCustomizing ggpubr and ggplot2 charts in R

Last updated 3 months ago

Correlation heat maps are nothing else but large plots that display correlations between variables from a data set. These plots are frequently used in manuscripts to depict relationships between lipid or metabolite concentrations as well as their associations with clinical variables. These relationships can be further explored in the context of lipid (metabolite) metabolism, especially when they change in response to disease progression, recovery, etc.

For practical examples of correlation heat maps, refer to the following papers:

  • A. Jeucken et al. A Comprehensive Functional Characterization of Escherichia coli Lipid Genes. DOI: - Fig. 5 (a study in Cell Reports utilizing lipid-lipid correlation networks, i.e., exploring statistical relationships between 100 most abundant lipid species, and analysis of these relationships in the context of lipid metabolism in bacterium).

  • K. Huynh et al. Concordant peripheral lipidome signatures in two large clinical studies of Alzheimer’s disease. DOI: - Fig. 1C (a study on peripheral lipidome Alzheimer's signatures published in Nature Communications, presenting statistical relationships (Spearman correlation) between total lipid classes, subclasses, and commonly reported clinical measures).

  • B. Peng et al. Identification of key lipids critical for platelet activation by comprehensive analysis of the platelet lipidome. DOI: - Fig. 4D (absolute quantification of platelet lipidome published in Blood; authors used correlation heat map with hierarchical clustering for 384 quantified lipid species; 12 distinct clusters of correlated and anticorrelated lipids were identified during platelet activation (Pearson correlation ≥0.85)).

  • Y. Ding et al. Comprehensive metabolomics profiling reveals common metabolic alterations underlying the four major non-communicable diseases in treated HIV infection. DOI: - Fig. 5 (the authors of the study published in eBioMedicine (a part of the Lancet Discovery Science journals) presented in the form of a correlation heat map statistical relationships (Spearman correlation) between eigenmetabolite, altered metabolites, classical lipids, and clinical parameters in all participants).

  • L. Ottensmann et al. Genome-wide association analysis of plasma lipidome identifies 495 genetic associations. DOI: - Fig. 1b (the authors of a manuscript published in Nature Communications used a correlation heatmap for presenting the absolute pairwise Pearson correlations between the lipid species included in the 11 clusters of the multivariate genome-wide association studies).

  • M. Lange et al. AdipoAtlas: A reference lipidome for human white adipose tissue. DOI: - Fig. 4C (the authors use correlation heat maps for presenting Pearson's correlations of significantly regulated lipids between lean and obese WAT).

Preparing correlation heatmaps via DataExplorer

The correlation heat map can be obtained through the plot_correlation() from the DataExplorer package:

# Calling library:
library(DataExplorer)
library(tidyverse) # We will certainly need tidyverse tools.

# Filtering all N and T:
data.N <-
  data %>%
  filter(Label == "N")

data.T <-
  data %>%
  filter(Label == "T")

# Consulting the documentation:
?plot_correlation()

# Preparing correlation heat maps.
# For all controls (N):
data.N %>%
  select(-`Sample Name`,
         - `Label`) %>% 
  plot_correlation()

plot_correlation(data.N)

# For all patients with PDAC (T):
data.T %>%
  select(-`Sample Name`,
         - `Label`) %>% 
  plot_correlation()

The heat maps:

Preparing correlation heatmaps via ggcorrplot

The ggcorrplot package produces a ggplot2 visualization of the correlation matrix. Read more about the package here:

In the first step, we need to compute this matrix, and in the next step, a visualization is obtained:

# Installing the ggcorrplot library:
install.packages("ggcorrplot")

# Calling library:
library(ggcorrplot)

# Reading the function's documentation:
?ggcorrplot()

# Computing a correlation matrix (cor() function):
cor.mat <-
  data %>%
  filter(Label == "N") %>% 
  select(-`Sample Name`,
         - `Label`) %>%
  cor()

# Computing matrix of p-values:
cor.mat.pval <-
  data %>%
  filter(Label == "N") %>% 
  select(-`Sample Name`,
         - `Label`) %>%
  cor_pmat()
  
# Creating correlation heat maps:
# PLOT A:
ggcorrplot(cor.mat)

# PLOT B:
data %>%
  filter(Label == "N") %>% 
  select(starts_with("CE") | starts_with("SM") | starts_with("LPC")) %>%
  cor() %>%
  ggcorrplot(method = "circle", 
           type = "lower",
           lab_size = 4,
           outline.color = "black")

Plot A:

Plot B:

Preparing correlation heatmaps via corrplot library

The principle is the same as for ggcorrplot library, first, compute the matrix of correlations, then - visualize it. Read more about the possibilities offered by the corrplot library here:

# Installing library:
install.packages("corrplot")

# Calling library
library(corrplot)

# Reading about the function of interest:
?corrplot()

# Exemplary plot:
data %>%
  filter(Label == "N") %>% 
  select(-`Sample Name`,
         -`Label`) %>%
  cor() %>%
  corrplot(method = 'color',
         tl.col = "black",
         tl.cex = 0.5,
         col=colorRampPalette(c("#00e8f0",
                                "#0d78ca", 
                                "#002060",
                                "white",
                                "#600000", 
                                "red", 
                                "#FF4D4D"))(200))

As you see, we immediately customized the color of scales (-1 to 1). By changing the color codes in colorRampPalette, you can create your palette of colors for the heat map. The correlation heat map obtained from the code above looks like this:

IMPORTANT: The correlation heat map produced by corrplot has a white background around it. You can crop it in any of the freely available graphical software (the simplest method).

https://doi.org/10.1016/j.celrep.2019.04.018
https://doi.org/10.1038/s41467-020-19473-7
https://doi.org/10.1182/blood-2017-12-822890
https://doi.org/10.1016/j.ebiom.2021.103548
https://doi.org/10.1038/s41467-023-42532-8
https://doi.org/10.1016/j.xcrm.2021.100407
RPubs - Document
Introduction to the ggcorrplot package.
Logo
An Introduction to corrplot Package
Introduction to the corrplot package
Logo
The DataExplorer's correlation heat map for all healthy controls.
The DataExplorer's correlation heat map for all pancreatic cancer patients (T).
Correlation heat map obtained for all healthy controls using ggcorrplot() from the ggcorrplot package.
Correlation heat map in the circular form for selected lipid species (only using controls - N) - ggcorrplot package.
The correlation heat map was obtained using the corrplot() function from the corrplot package.