Omics data visualization in R and Python

Heat maps with clustering

Metabolites and lipids multivariate statistical analysis in R

PreviousDendrograms NextInteractive heat maps

Last updated 3 months ago

Heat maps with clustering

Metabolites and lipids multivariate statistical analysis in R

Heat maps with clustering are one of the most effective tools for visualizing alterations in OMICs data sets. Several lipids or metabolites can be depicted in one visualization across all measured samples. In heat maps, as you have just seen in the previous subchapter, normalized concentrations are assigned to continuously scaled fill colors. Upregulations in a target biological group are frequently presented with light, warm colors like red, orange, and yellow, while downregulations are shown with dark and cold colors like blue, green, and violet. Clustering features (lipids, metabolites) or samples can also give clues to interesting data structures and relationships.

Practical applications of heat maps (examples)

Heat maps are one of the most commonly used visualization techniques for -omics data, as they can convey a substantial amount of information in a single, effective chart, whether it is lipid/metabolized normalized concentrations or clinical parameters across experimental groups (or clusters of samples). Check out the selected examples:

J. Wu et al. Lipidomic signatures align with inflammatory patterns and outcomes in critical illness. DOI: - e.g., Fig. 2c - e, Fig. 4 (the ComplexHeatmap library presented below was used to generate heat maps).
W. Wang et al. Metabolomics facilitates differential diagnosis in common inherited retinal degenerations by exploring their profiles of serum metabolites. DOI: - Fig. 2.
J. Fang et al. Integrated multi-omics analysis unravels the floral scent characteristics and regulation in “Hutou” multi-petal jasmine. DOI: - e.g., Fig. 2b (the ComplexHeatmap again in use!).
D. Wolrab et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. DOI: - Fig. 5e, Fig. 6e.
S. Lam et al. A multi-omics investigation of the composition and function of extracellular vesicles along the temporal trajectory of COVID-19. DOI: - e.g., Fig. 2 (left part of the panel), Fig. 3a & b, etc.
R. Lerner et al. Four-dimensional trapped ion mobility spectrometry lipidomics for high throughput clinical profiling of human blood samples. DOI: - Fig. 5 (the authors of the study published in Nature Communications use heat maps for the comparison and cross-validation of quantified lipid concentrations).
J. Idkowiak et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. DOI: - e.g., Fig. 3, Fig. 4 B - D, Fig. 7 (the authors use heat maps for the presentation of method validation results (Fig. 3 & 4), and for comparing and presenting trends in the case of Fig. 7).
M. Lange et al. AdipoAtlas: A reference lipidome for human white adipose tissue. DOI: - Fig. 4A & E.
V. de Laat et al. Intrinsic temperature increase drives lipid metabolism towards ferroptosis evasion and chemotherapy resistance in pancreatic cancer. DOI: - Fig. 2d & Fig. 4g
E. Rysman et al. De novo Lipogenesis Protects Cancer Cells from Free Radicals and Chemotherapeutics by Promoting Membrane Lipid Saturation. DOI: - Fig. 1d - e, Fig. 2 (right part of the panel), Fig. 3b.

Heat maps in R with ComplexHeatmap

To prepare an exemplary heat map for the PDAC data set, we will use the ComplexHeatmap library from Bioconductor. We will use the 20 most significant lipids from the Kruskal-Wallis test to create our heat map.

# Preparing heat map in R with Bioconductor package ComplexHeatmap.
# Calling libraries:
library(tidymodels)
library(ComplexHeatmap)
library(rstatix)

# Read PDAC data set in R:
data <- read_xlsx(file.choose())

# Adjust column types:
data$Label <- as.factor(data$Label)

# Create a long matrix for the KW test:
data.long <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations")
               
# Computing KW test:
KW.test <- 
  data.long %>%
  group_by(Lipids) %>%
  kruskal_test(Concentrations ~ Label)
  
# Slicing head with the 20 most significant lipids:
KW.test.head <-
  KW.test %>%
  arrange(p) %>%
  slice_head(n = 20)
  
# Creating a vector with lipid names:
Lipids <- KW.test.head$Lipids 

# Extracting the 20 most significant lipids from the KW test from our data set:
data.selected <-
  data %>%
  select(`Sample Name`,
         Label, 
         all_of(Lipids))
         
# Data log10-transformation and scaling:
data.log10 <-
  data.selected %>%
  mutate_if(is.numeric, log10)
  
Pareto.scaling <- function(x) {(x-mean(x))/sqrt(sd(x))}

data.Pareto.scaled <-
  data.log10 %>%
  mutate_if(is.numeric, ~Pareto.scaling(.))
  
# Creating a long tibble of normalized concentrations of selected lipids:
data.Pareto.scaled.long <-
  data.Pareto.scaled %>%
  pivot_longer(cols = `SM 41:1;O2`:`PC 32:2`,
               names_to = "Lipids",
               values_to = "Concentrations")
               
# Checking the highest/lowest normalized concentration:
data.Pareto.scaled.long %>% filter(Concentrations == max(Concentrations))
data.Pareto.scaled.long %>% filter(Concentrations == min(Concentrations))

We obtain in the R console:

> data.Pareto.scaled.long %>% filter(Concentrations == max(Concentrations))
# A tibble: 1 × 4
  `Sample Name` Label Lipids      Concentrations
  <chr>         <fct> <chr>                <dbl>
1 1a227         T     Cer 36:1;O2           2.25
> data.Pareto.scaled.long %>% filter(Concentrations == min(Concentrations))
# A tibble: 1 × 4
  `Sample Name` Label Lipids     Concentrations
  <chr>         <fct> <chr>               <dbl>
1 1a121         T     SM 42:1;O2          -2.38

Hence, we can set the scale of fill colors for our heat map between -3 and 3:

# Preparing fill color scale for the heat map.
# Install package "circlize":
install.packages("circlize")

# Call the library - we will need the function colorRamp2():
library(circlize)

# Create a fill color scale for the heat map:
colors <- colorRamp2(c(-3,-1.5,0,1.5, 3), 
                     c('#00005c', 'royalblue', "white", 'red2', '#820101'))

# We basically select 5 breakpoints and five corresponding fill colors.
# Now, we want to add above the heat map bars corresponding to biological groups.
# We create a data frame 'ann' for annotations:
ann <- data.frame(data.Pareto.scaled$Label)

# We change the name of the column in 'ann' data frame to "Label"
colnames(ann) <- c('Label')

# Next, we select color of each bar corresponding to N, T, and PAN:
colours <- list('Label' = c('T' = 'red2', 'N' = 'royalblue', 'PAN' = 'orange'))

# Finally, we can collect all annotations using ComplexHeatmap function HeatmapAnnotation():
colAnn <- HeatmapAnnotation(df = ann,     # The data frame with heat map column annotations.
                            which = 'col', # We indicate to use the annotations for the column.
                            col = colours,  # Colors for group bars.
                            show_legend = c(FALSE))  # We remove the legend for biological groups.

We need to transform the data into a form expected by the function, which will produce our heat map. We need to create a matrix with column names. Here is the code:

# Creating a matrix with numeric data for our heat map.
# We must remove all <chr> and <fct> columns:
matrix <- as.matrix(data.Pareto.scaled %>% select(-`Sample Name`, -Label)) 

# We must transpose the matrix - samples are now in columns and lipids are in rows.
# That's the most frequently used arrangement of a heat map.
t.matrix <- t(matrix)

Finally, we create our heat map:

# Heat map code:
heatmap <- Heatmap(t.matrix,                         # Matrix with data for building heat map.
                   col = colors,                     # Fill color scale.
                   bottom_annotation = colAnn,       # Annotations of the heat map bottom.
                   cluster_columns = FALSE,          # At first - we skip the column clustering. We keep only rows clustering (features).
                   column_split = ann$Label,         # We will want to split the heat map according to biological groups.
                   column_gap = unit(2, "mm"),       # Gap between slices will have 2 mm.
                   show_column_names = FALSE,        # No column names for now.
                   show_heatmap_legend = T,          # Show fill color legend.
                   heatmap_legend_param = list(title = 'Lipid abundance', direction = 'horizontal'))  # Parameters of fill color legend.
                   
# Read carefully the documentation of this function. 
# It allows for a broad range of modifications through simple-in-use arguments!
?Heatmap()

To draw our heat map, we run:

# Draw heat map:
draw(heatmap, heatmap_legend_side="right")

And we obtain this beautiful chart:

We see interesting clustering outcomes for features - lipids are mostly grouped into classes, e.g., SM species, LPC species, PC with PC O-/PC P- species. We can finally apply clustering to columns too (samples clustering):

# Heat map with samples and features clustering:
# Now, we will need the legend representing biological groups:
colAnn <- HeatmapAnnotation(df = ann,
                            which = 'col',
                            col = colours,
                            show_legend = c(TRUE))   # Set this argument to true.

heatmap <- Heatmap(t.matrix, 
                   col = colors, 
                   bottom_annotation = colAnn, 
                   cluster_columns = TRUE,
                   show_column_names = FALSE,
                   show_heatmap_legend = T, heatmap_legend_param = list(title = 'Lipid abundance', 
                                                                        direction = 'horizontal'))
                                                                        
# Draw the updated heat map:
draw(heatmap, heatmap_legend_side="right")

The updated heat map:

After applying the clustering to samples, we observe a separation of healthy volunteers (N) from patients with PDAC (T). For almost all lipids, except for Cer 36:1;O2, a downregulation of their concentrations was found in the serum of patients with pancreatic cancer. Patients with pancreatitis (PAN) do not form a separate cluster and are rather spread between healthy individuals and patients with cancer. Based on the heat map above, we would rather conclude that in their serum lipid profile, we do not observe a characteristic for pancreatic cancer patients' downregulations, e.g. in the long chain SM and Cer profiles, and then in LPC, PC, PC O-/PC P- profiles. As you see, we can make lots of observations based on elegant one image.

The ComplexHeatmap package offers many more amazing designs, which are presented in the detailed vignette of the package. We encourage you to read it carefully as well as the articles published by the authors of the ComplexHeatmap package:

PreviousDendrograms NextInteractive heat maps

Last updated 3 months ago

Practical applications of heat maps (examples)

J. Wu et al. Lipidomic signatures align with inflammatory patterns and outcomes in critical illness. DOI: - e.g., Fig. 2c - e, Fig. 4 (the ComplexHeatmap library presented below was used to generate heat maps).
W. Wang et al. Metabolomics facilitates differential diagnosis in common inherited retinal degenerations by exploring their profiles of serum metabolites. DOI: - Fig. 2.
J. Fang et al. Integrated multi-omics analysis unravels the floral scent characteristics and regulation in “Hutou” multi-petal jasmine. DOI: - e.g., Fig. 2b (the ComplexHeatmap again in use!).
D. Wolrab et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. DOI: - Fig. 5e, Fig. 6e.
S. Lam et al. A multi-omics investigation of the composition and function of extracellular vesicles along the temporal trajectory of COVID-19. DOI: - e.g., Fig. 2 (left part of the panel), Fig. 3a & b, etc.
R. Lerner et al. Four-dimensional trapped ion mobility spectrometry lipidomics for high throughput clinical profiling of human blood samples. DOI: - Fig. 5 (the authors of the study published in Nature Communications use heat maps for the comparison and cross-validation of quantified lipid concentrations).
J. Idkowiak et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. DOI: - e.g., Fig. 3, Fig. 4 B - D, Fig. 7 (the authors use heat maps for the presentation of method validation results (Fig. 3 & 4), and for comparing and presenting trends in the case of Fig. 7).
M. Lange et al. AdipoAtlas: A reference lipidome for human white adipose tissue. DOI: - Fig. 4A & E.
V. de Laat et al. Intrinsic temperature increase drives lipid metabolism towards ferroptosis evasion and chemotherapy resistance in pancreatic cancer. DOI: - Fig. 2d & Fig. 4g
E. Rysman et al. De novo Lipogenesis Protects Cancer Cells from Free Radicals and Chemotherapeutics by Promoting Membrane Lipid Saturation. DOI: - Fig. 1d - e, Fig. 2 (right part of the panel), Fig. 3b.

Heat maps in R with ComplexHeatmap

# Preparing heat map in R with Bioconductor package ComplexHeatmap.
# Calling libraries:
library(tidymodels)
library(ComplexHeatmap)
library(rstatix)

# Read PDAC data set in R:
data <- read_xlsx(file.choose())

# Adjust column types:
data$Label <- as.factor(data$Label)

# Create a long matrix for the KW test:
data.long <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations")
               
# Computing KW test:
KW.test <- 
  data.long %>%
  group_by(Lipids) %>%
  kruskal_test(Concentrations ~ Label)
  
# Slicing head with the 20 most significant lipids:
KW.test.head <-
  KW.test %>%
  arrange(p) %>%
  slice_head(n = 20)
  
# Creating a vector with lipid names:
Lipids <- KW.test.head$Lipids 

# Extracting the 20 most significant lipids from the KW test from our data set:
data.selected <-
  data %>%
  select(`Sample Name`,
         Label, 
         all_of(Lipids))
         
# Data log10-transformation and scaling:
data.log10 <-
  data.selected %>%
  mutate_if(is.numeric, log10)
  
Pareto.scaling <- function(x) {(x-mean(x))/sqrt(sd(x))}

data.Pareto.scaled <-
  data.log10 %>%
  mutate_if(is.numeric, ~Pareto.scaling(.))
  
# Creating a long tibble of normalized concentrations of selected lipids:
data.Pareto.scaled.long <-
  data.Pareto.scaled %>%
  pivot_longer(cols = `SM 41:1;O2`:`PC 32:2`,
               names_to = "Lipids",
               values_to = "Concentrations")
               
# Checking the highest/lowest normalized concentration:
data.Pareto.scaled.long %>% filter(Concentrations == max(Concentrations))
data.Pareto.scaled.long %>% filter(Concentrations == min(Concentrations))

We obtain in the R console:

> data.Pareto.scaled.long %>% filter(Concentrations == max(Concentrations))
# A tibble: 1 × 4
  `Sample Name` Label Lipids      Concentrations
  <chr>         <fct> <chr>                <dbl>
1 1a227         T     Cer 36:1;O2           2.25
> data.Pareto.scaled.long %>% filter(Concentrations == min(Concentrations))
# A tibble: 1 × 4
  `Sample Name` Label Lipids     Concentrations
  <chr>         <fct> <chr>               <dbl>
1 1a121         T     SM 42:1;O2          -2.38

Hence, we can set the scale of fill colors for our heat map between -3 and 3:

# Preparing fill color scale for the heat map.
# Install package "circlize":
install.packages("circlize")

# Call the library - we will need the function colorRamp2():
library(circlize)

# Create a fill color scale for the heat map:
colors <- colorRamp2(c(-3,-1.5,0,1.5, 3), 
                     c('#00005c', 'royalblue', "white", 'red2', '#820101'))

# We basically select 5 breakpoints and five corresponding fill colors.
# Now, we want to add above the heat map bars corresponding to biological groups.
# We create a data frame 'ann' for annotations:
ann <- data.frame(data.Pareto.scaled$Label)

# We change the name of the column in 'ann' data frame to "Label"
colnames(ann) <- c('Label')

# Next, we select color of each bar corresponding to N, T, and PAN:
colours <- list('Label' = c('T' = 'red2', 'N' = 'royalblue', 'PAN' = 'orange'))

# Finally, we can collect all annotations using ComplexHeatmap function HeatmapAnnotation():
colAnn <- HeatmapAnnotation(df = ann,     # The data frame with heat map column annotations.
                            which = 'col', # We indicate to use the annotations for the column.
                            col = colours,  # Colors for group bars.
                            show_legend = c(FALSE))  # We remove the legend for biological groups.

We need to transform the data into a form expected by the function, which will produce our heat map. We need to create a matrix with column names. Here is the code:

# Creating a matrix with numeric data for our heat map.
# We must remove all <chr> and <fct> columns:
matrix <- as.matrix(data.Pareto.scaled %>% select(-`Sample Name`, -Label)) 

# We must transpose the matrix - samples are now in columns and lipids are in rows.
# That's the most frequently used arrangement of a heat map.
t.matrix <- t(matrix)

Finally, we create our heat map:

# Heat map code:
heatmap <- Heatmap(t.matrix,                         # Matrix with data for building heat map.
                   col = colors,                     # Fill color scale.
                   bottom_annotation = colAnn,       # Annotations of the heat map bottom.
                   cluster_columns = FALSE,          # At first - we skip the column clustering. We keep only rows clustering (features).
                   column_split = ann$Label,         # We will want to split the heat map according to biological groups.
                   column_gap = unit(2, "mm"),       # Gap between slices will have 2 mm.
                   show_column_names = FALSE,        # No column names for now.
                   show_heatmap_legend = T,          # Show fill color legend.
                   heatmap_legend_param = list(title = 'Lipid abundance', direction = 'horizontal'))  # Parameters of fill color legend.
                   
# Read carefully the documentation of this function. 
# It allows for a broad range of modifications through simple-in-use arguments!
?Heatmap()

To draw our heat map, we run:

# Draw heat map:
draw(heatmap, heatmap_legend_side="right")

And we obtain this beautiful chart:

# Heat map with samples and features clustering:
# Now, we will need the legend representing biological groups:
colAnn <- HeatmapAnnotation(df = ann,
                            which = 'col',
                            col = colours,
                            show_legend = c(TRUE))   # Set this argument to true.

heatmap <- Heatmap(t.matrix, 
                   col = colors, 
                   bottom_annotation = colAnn, 
                   cluster_columns = TRUE,
                   show_column_names = FALSE,
                   show_heatmap_legend = T, heatmap_legend_param = list(title = 'Lipid abundance', 
                                                                        direction = 'horizontal'))
                                                                        
# Draw the updated heat map:
draw(heatmap, heatmap_legend_side="right")

The updated heat map: