💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  • Practical applications of t-SNE (examples)
  • Required packages
  • Loading data into R
  • Principal Component Analysis
  • Performing t-SNE
  1. Metabolites and lipids multivariate statistical analysis in R

t-Distributed Stochastic Neighbor Embedding (t-SNE)

Metabolites and lipids multivariate statistical analysis in R

PreviousPrincipal Component Analysis (PCA)NextUniform Manifold Approximation and Projection (UMAP)

Last updated 3 months ago

Practical applications of t-SNE (examples)

t-SNE is a non-linear dimensionality reduction technique designed to represent high-dimensional data (e.g., lipid or metabolite concentrations) in a lower-dimensional space (typically two or three dimensions) while preserving the similarities between neighboring data points, i.e., local structures. Its application in the field of -omics has gained increasing interest in recent years, especially in genomics and transcriptomics. t-SNE is more frequently featured in lipidomics and metabolomics studies, particularly those focused on single-cell -omics.

Here, please find examples of practical applications of this technique:

  • D. Hornburg et al. Dynamic lipidome alterations associated with human health, disease and ageing. DOI: - Fig. 2b (the authors utilized t-SNE to examine clustering based on the 100 most personalized lipids from 11 participants who provided at least 12 healthy samples).

  • S. E. Hancock et al. FACS-assisted single-cell lipidome analysis of phosphatidylcholines and sphingomyelins in cells of different lineages. DOI: - Fig. 4A (the authors use t-SNE to examine clustering of single-cell lipidomics dataset consisting of C2C12 & HepG2 cells grown in both control (CON) and docosahexaenoic acid (DHA)-supplemented media).

  • Z. Wang et al. Data-Driven Deciphering of Latent Lesions in Heterogeneous Tissue Using Function-Directed t-SNE of Mass Spectrometry Imaging Data. DOI: (t-SNE application for clustering of mass spectrometry imaging data).

  • H. Tian et al. Multimodal mass spectrometry imaging identifies cell-type-specific metabolic and lipidomic variation in the mammalian liver. DOI: (t-SNE in mass spectrometry imaging application - cell clustering based on lipid & metabolite profiles).

Required packages

The required packages for this section are uwot, Rtsne, scales, and ggrepel. These can be installed with the following command in the command window (Windows) / terminal (Mac):

# Installation of all required packages:
install.packages("Rtsne")
install.packages("scales")
install.packages("ggrepel")

# Activate libraries:
library(Rtsne)
library(scales)
library(ggrepel)

# Additionally, activate tidyverse:
library(tidyverse)

Loading data into R

Here, we will use the data set presented in the manuscript:

Always ensure you have set the appropriate working directory (wd). If you haven't done that yet, this is the first line of the code, followed by loading data into R. Read the data into R with the 'read_excel()' function from the readxl package we saw earlier in the GitBook. Set as a data.frame to make it easier to handle the data:

# Setting a working directory (wd)
setwd('D:/Data analysis')

# Loading data into R:
data <- readxl::read_excel("GOUT_CTRL_QC_Ales_data_31012025.xlsx")
data <- as.data.frame(data)
head(data)

Next, set the `Sample Name` column as row names:

# Set the row names the same as `Sample Name`
rownames(data) <- data$`Sample Name`
data$`Sample Name` <- NULL

Principal Component Analysis

Usually, before running t-SNE, PCA is performed on the high-dimensional data to reduce the dimensions (e.g., 30, 50), and then t-SNE is applied to the derived Principal Components:

The first step is to normalize the data such that the features (lipids) have zero mean and unit variance; this can easily be done with the 'scale function. By indexing the data frame with `data[,-1]`, we'll select all the data in the data frame except for the first column, which contains the labels. The data frame is re-annotated to data_normalized to make it clear that we are working with normalized data:

# Data normalization (Auto-scaling)
data_normalized <- data
data_normalized[, -1] <- scale(data[, -1])

Next, we use the PCA with the `prcomp()` function. We'll select the 50 first principal components, and we'll apply the PCA algorithm to the normalized data:

# Performing PCA:
n_components <- 50
pca_result <- prcomp(data_normalized[, -1], center = FALSE) # Note! OUR FEATURES WERE AUTO-SCALED EARLIER!
pca_features <- pca_result$x[, 1:n_components]

Performing t-SNE

Using the `Rtsne()` function from the Rtsne package, we'll apply the t-SNE algorithm on the 50 Principal Components:

# t-SNE in R
n_tsne_components <- 2
tsne_result <- Rtsne(pca_features, dims = n_tsne_components, perplexity = 30, verbose = TRUE)

The `Rtsne()` function returns the results as a matrix, so let's convert these results into a data frame for easier handling:

# Changing results output (matrix) into a data frame:
tsne_df <- data.frame(X = tsne_result$Y[,1], Y = tsne_result$Y[,2])
tsne_df$Label <- data_normalized$Label

We can now visualize the projection of the samples to the new feature space with a scatter plot:

# Obtaining t-SNE score plot:
tSNE <- ggplot(tsne_df, aes(x = X, y = Y, color = Label)) +
  geom_point(alpha = 0.7) +
  theme_minimal() +
  labs(
    title = "t-SNE Visualization",
    x = "t-SNE1", 
    y = "t-SNE2"  
  ) +
  scale_color_manual(values = c("Gout" = "red", "Control" = "blue")) +  
  theme(legend.title = element_blank()) 
 
# Exporting a high-quality publication-plot: 
# Call library:
library(ggimage)

# Score plot
## Generate a preview and optimize the plot presentation (tSNE scores plot):
ggpreview(plot = tSNE,               # The object that you want to preview.
          width = 300,               # Width in px.
          height = 300,              # Height in px.
          units = "px",              # Unit - of size - px.
          dpi = 300,                 # Sharpness.
          scale = 6)            # You may need to use a different scale.


## Save the plot in the working directory using ggsave (ggplot2 package - tidyverse):
ggsave(plot = tSNE,    # The R object to be saved.        
       device = "svg",           # Format.
       filename = "tSNE_gout_FINAL.svg",
       width = 300,
       height = 300,
       units = "px",
       dpi = 300,
       scale = 6)

We obtain the following t-SNE score plot (in gray, QC samples):

If one would like to obtain a score plot without QC samples, this can be done through the following block of code:

# Remove QC samples from the data set:
data_normalized_NO_QC <- 
  data_normalized %>%
  filter(Label != 'QC')

# Performing PCA:
n_components <- 50
pca_result_NO_QC <- prcomp(data_normalized_NO_QC[, -1], center = FALSE) # Note! OUR FEATURES WERE AUTO-SCALED EARLIER!
pca_features_NO_QC <- pca_result_NO_QC$x[, 1:n_components]

# t-SNE in R
n_tsne_components_NO_QC <- 2
tsne_result_NO_QC <- Rtsne(pca_features_NO_QC, dims = n_tsne_components_NO_QC, perplexity = 30, verbose = TRUE)

# Changing results output (matrix) into a data frame:
tsne_df_NO_QC <- data.frame(X = tsne_result_NO_QC$Y[,1], Y = tsne_result_NO_QC$Y[,2])
tsne_df_NO_QC$Label <- data_normalized_NO_QC$Label

# Obtaining t-SNE score plot:
tSNE_2 <- ggplot(tsne_df_NO_QC, aes(x = X, y = Y, color = Label)) +
  geom_point(alpha = 0.7) +
  theme_minimal() +
  labs(
    title = "t-SNE Visualization",
    x = "t-SNE1", 
    y = "t-SNE2"  
  ) +
  scale_color_manual(values = c("Gout" = "red", "Control" = "blue")) +  
  theme(legend.title = element_blank()) 

# Exporting high-quality score plot.
## Generate a preview and optimize the plot presentation (t-SNE score plot):
ggpreview(plot = tSNE_2,               # The object that you want to preview.
          width = 300,               # Width in px.
          height = 300,              # Height in px.
          units = "px",              # Unit - of size - px.
          dpi = 300,                 # Sharpness.
          scale = 6)            # You may need to use a different scale.


## Save the plot in the working directory using ggsave (ggplot2 package - tidyverse):
ggsave(plot = tSNE_2,    # The R object to be saved.        
       device = "svg",           # Format.
       filename = "tSNE_2_gout_FINAL.svg",
       width = 300,
       height = 300,
       units = "px",
       dpi = 300,
       scale = 6)

The updated version of the t-SNE score plot:

https://doi.org/10.1038/s42255-023-00880-1
https://doi.org/10.1016/j.jlr.2023.100341
https://doi.org/10.1021/acs.analchem.2c02990
https://doi.org/10.1016/j.devcel.2024.01.025
3MB
GOUT_CTRL_QC_Ales_data_31012025.xlsx
The lipidomics data set published by Kvasnička et al. in their manuscript Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment; Arthritis Research & Therapy (2023).
The t-SNE score plot for the selected lipidomics data set.
The updated version of the t-SNE score plot (without QC samples) for the selected lipidomics data set.