💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  • Practical applications of heat maps (examples)
  • Heat maps in R with ComplexHeatmap
  1. Metabolites and lipids multivariate statistical analysis in R
  2. Hierarchical Clustering (HC)

Heat maps with clustering

Metabolites and lipids multivariate statistical analysis in R

PreviousDendrogramsNextInteractive heat maps

Last updated 3 months ago

Heat maps with clustering are one of the most effective tools for visualizing alterations in OMICs data sets. Several lipids or metabolites can be depicted in one visualization across all measured samples. In heat maps, as you have just seen in the previous subchapter, normalized concentrations are assigned to continuously scaled fill colors. Upregulations in a target biological group are frequently presented with light, warm colors like red, orange, and yellow, while downregulations are shown with dark and cold colors like blue, green, and violet. Clustering features (lipids, metabolites) or samples can also give clues to interesting data structures and relationships.

Practical applications of heat maps (examples)

Heat maps are one of the most commonly used visualization techniques for -omics data, as they can convey a substantial amount of information in a single, effective chart, whether it is lipid/metabolized normalized concentrations or clinical parameters across experimental groups (or clusters of samples). Check out the selected examples:

  • J. Wu et al. Lipidomic signatures align with inflammatory patterns and outcomes in critical illness. DOI: - e.g., Fig. 2c - e, Fig. 4 (the ComplexHeatmap library presented below was used to generate heat maps).

  • W. Wang et al. Metabolomics facilitates differential diagnosis in common inherited retinal degenerations by exploring their profiles of serum metabolites. DOI: - Fig. 2.

  • J. Fang et al. Integrated multi-omics analysis unravels the floral scent characteristics and regulation in “Hutou” multi-petal jasmine. DOI: - e.g., Fig. 2b (the ComplexHeatmap again in use!).

  • D. Wolrab et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. DOI: - Fig. 5e, Fig. 6e.

  • S. Lam et al. A multi-omics investigation of the composition and function of extracellular vesicles along the temporal trajectory of COVID-19. DOI: - e.g., Fig. 2 (left part of the panel), Fig. 3a & b, etc.

  • R. Lerner et al. Four-dimensional trapped ion mobility spectrometry lipidomics for high throughput clinical profiling of human blood samples. DOI: - Fig. 5 (the authors of the study published in Nature Communications use heat maps for the comparison and cross-validation of quantified lipid concentrations).

  • J. Idkowiak et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. DOI: - e.g., Fig. 3, Fig. 4 B - D, Fig. 7 (the authors use heat maps for the presentation of method validation results (Fig. 3 & 4), and for comparing and presenting trends in the case of Fig. 7).

  • M. Lange et al. AdipoAtlas: A reference lipidome for human white adipose tissue. DOI: - Fig. 4A & E.

  • V. de Laat et al. Intrinsic temperature increase drives lipid metabolism towards ferroptosis evasion and chemotherapy resistance in pancreatic cancer. DOI: - Fig. 2d & Fig. 4g

  • E. Rysman et al. De novo Lipogenesis Protects Cancer Cells from Free Radicals and Chemotherapeutics by Promoting Membrane Lipid Saturation. DOI: - Fig. 1d - e, Fig. 2 (right part of the panel), Fig. 3b.

Heat maps in R with ComplexHeatmap

To prepare an exemplary heat map for the PDAC data set, we will use the ComplexHeatmap library from Bioconductor. We will use the 20 most significant lipids from the Kruskal-Wallis test to create our heat map.

# Preparing heat map in R with Bioconductor package ComplexHeatmap.
# Calling libraries:
library(tidymodels)
library(ComplexHeatmap)
library(rstatix)

# Read PDAC data set in R:
data <- read_xlsx(file.choose())

# Adjust column types:
data$Label <- as.factor(data$Label)

# Create a long matrix for the KW test:
data.long <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations")
               
# Computing KW test:
KW.test <- 
  data.long %>%
  group_by(Lipids) %>%
  kruskal_test(Concentrations ~ Label)
  
# Slicing head with the 20 most significant lipids:
KW.test.head <-
  KW.test %>%
  arrange(p) %>%
  slice_head(n = 20)
  
# Creating a vector with lipid names:
Lipids <- KW.test.head$Lipids 

# Extracting the 20 most significant lipids from the KW test from our data set:
data.selected <-
  data %>%
  select(`Sample Name`,
         Label, 
         all_of(Lipids))
         
# Data log10-transformation and scaling:
data.log10 <-
  data.selected %>%
  mutate_if(is.numeric, log10)
  
Pareto.scaling <- function(x) {(x-mean(x))/sqrt(sd(x))}

data.Pareto.scaled <-
  data.log10 %>%
  mutate_if(is.numeric, ~Pareto.scaling(.))
  
# Creating a long tibble of normalized concentrations of selected lipids:
data.Pareto.scaled.long <-
  data.Pareto.scaled %>%
  pivot_longer(cols = `SM 41:1;O2`:`PC 32:2`,
               names_to = "Lipids",
               values_to = "Concentrations")
               
# Checking the highest/lowest normalized concentration:
data.Pareto.scaled.long %>% filter(Concentrations == max(Concentrations))
data.Pareto.scaled.long %>% filter(Concentrations == min(Concentrations))

We obtain in the R console:

> data.Pareto.scaled.long %>% filter(Concentrations == max(Concentrations))
# A tibble: 1 × 4
  `Sample Name` Label Lipids      Concentrations
  <chr>         <fct> <chr>                <dbl>
1 1a227         T     Cer 36:1;O2           2.25
> data.Pareto.scaled.long %>% filter(Concentrations == min(Concentrations))
# A tibble: 1 × 4
  `Sample Name` Label Lipids     Concentrations
  <chr>         <fct> <chr>               <dbl>
1 1a121         T     SM 42:1;O2          -2.38

Hence, we can set the scale of fill colors for our heat map between -3 and 3:

# Preparing fill color scale for the heat map.
# Install package "circlize":
install.packages("circlize")

# Call the library - we will need the function colorRamp2():
library(circlize)

# Create a fill color scale for the heat map:
colors <- colorRamp2(c(-3,-1.5,0,1.5, 3), 
                     c('#00005c', 'royalblue', "white", 'red2', '#820101'))

# We basically select 5 breakpoints and five corresponding fill colors.
# Now, we want to add above the heat map bars corresponding to biological groups.
# We create a data frame 'ann' for annotations:
ann <- data.frame(data.Pareto.scaled$Label)

# We change the name of the column in 'ann' data frame to "Label"
colnames(ann) <- c('Label')

# Next, we select color of each bar corresponding to N, T, and PAN:
colours <- list('Label' = c('T' = 'red2', 'N' = 'royalblue', 'PAN' = 'orange'))

# Finally, we can collect all annotations using ComplexHeatmap function HeatmapAnnotation():
colAnn <- HeatmapAnnotation(df = ann,     # The data frame with heat map column annotations.
                            which = 'col', # We indicate to use the annotations for the column.
                            col = colours,  # Colors for group bars.
                            show_legend = c(FALSE))  # We remove the legend for biological groups.

We need to transform the data into a form expected by the function, which will produce our heat map. We need to create a matrix with column names. Here is the code:

# Creating a matrix with numeric data for our heat map.
# We must remove all <chr> and <fct> columns:
matrix <- as.matrix(data.Pareto.scaled %>% select(-`Sample Name`, -Label)) 

# We must transpose the matrix - samples are now in columns and lipids are in rows.
# That's the most frequently used arrangement of a heat map.
t.matrix <- t(matrix)

Finally, we create our heat map:

# Heat map code:
heatmap <- Heatmap(t.matrix,                         # Matrix with data for building heat map.
                   col = colors,                     # Fill color scale.
                   bottom_annotation = colAnn,       # Annotations of the heat map bottom.
                   cluster_columns = FALSE,          # At first - we skip the column clustering. We keep only rows clustering (features).
                   column_split = ann$Label,         # We will want to split the heat map according to biological groups.
                   column_gap = unit(2, "mm"),       # Gap between slices will have 2 mm.
                   show_column_names = FALSE,        # No column names for now.
                   show_heatmap_legend = T,          # Show fill color legend.
                   heatmap_legend_param = list(title = 'Lipid abundance', direction = 'horizontal'))  # Parameters of fill color legend.
                   
# Read carefully the documentation of this function. 
# It allows for a broad range of modifications through simple-in-use arguments!
?Heatmap()

To draw our heat map, we run:

# Draw heat map:
draw(heatmap, heatmap_legend_side="right")

And we obtain this beautiful chart:

We see interesting clustering outcomes for features - lipids are mostly grouped into classes, e.g., SM species, LPC species, PC with PC O-/PC P- species. We can finally apply clustering to columns too (samples clustering):

# Heat map with samples and features clustering:
# Now, we will need the legend representing biological groups:
colAnn <- HeatmapAnnotation(df = ann,
                            which = 'col',
                            col = colours,
                            show_legend = c(TRUE))   # Set this argument to true.

heatmap <- Heatmap(t.matrix, 
                   col = colors, 
                   bottom_annotation = colAnn, 
                   cluster_columns = TRUE,
                   show_column_names = FALSE,
                   show_heatmap_legend = T, heatmap_legend_param = list(title = 'Lipid abundance', 
                                                                        direction = 'horizontal'))
                                                                        
# Draw the updated heat map:
draw(heatmap, heatmap_legend_side="right")

The updated heat map:

After applying the clustering to samples, we observe a separation of healthy volunteers (N) from patients with PDAC (T). For almost all lipids, except for Cer 36:1;O2, a downregulation of their concentrations was found in the serum of patients with pancreatic cancer. Patients with pancreatitis (PAN) do not form a separate cluster and are rather spread between healthy individuals and patients with cancer. Based on the heat map above, we would rather conclude that in their serum lipid profile, we do not observe a characteristic for pancreatic cancer patients' downregulations, e.g. in the long chain SM and Cer profiles, and then in LPC, PC, PC O-/PC P- profiles. As you see, we can make lots of observations based on elegant one image.

The ComplexHeatmap package offers many more amazing designs, which are presented in the detailed vignette of the package. We encourage you to read it carefully as well as the articles published by the authors of the ComplexHeatmap package:

https://doi.org/10.1038/s41467-022-34420-4
https://doi.org/10.1038/s41467-024-47911-3
https://doi.org/10.1038/s42003-025-07685-w
https://doi.org/10.1038/s41467-021-27765-9
https://doi.org/10.1038/s42255-021-00425-4
https://doi.org/10.1038/s41467-023-36520-1
https://doi.org/10.1007/s00216-022-04490-w
https://doi.org/10.1016/j.xcrm.2021.100407
https://doi.org/10.1038/s41467-024-52978-z
https://doi.org/10.1158/0008-5472.CAN-09-3871
About | ComplexHeatmap Complete Reference
The ComplexHeat map vignette.
Logo
The heat map was obtained using the Bioconductor package 'ComplexHeatmap'. The heat map depicts the 20 most significant lipids according to the Kruskal-Wallis test. Lipid concentrations were log10-transformed and Pareto-scaled. The clustering of lipids was performed.
The heat map was obtained using the Bioconductor package 'ComplexHeatmap'. The heat map depicts the 20 most significant lipids according to the Kruskal-Wallis test. Lipid concentrations were log10-transformed and Pareto-scaled. The clustering of lipids and samples was performed.