Lipid maps and acyl-chain plots

Metabolites and lipids univariate statistical analysis in R

Lipid maps and acyl-chain plots are effective alternatives to volcano plots. Both deliver deeper insight into lipidome alterations as more molecules are labeled and grouped by classes in these plots. The easiest way to produce lipid maps is through the Cytoscape software. The acyl-chain plots are generated in R. Here, we will show you how to prepare both of them.

Lipid maps from the Cytoscape software (R computations)

Examples of such lipid maps can be found in the following manuscripts:

  • E. Cífková et al. Lipidomic and metabolomic analysis reveals changes in biochemical pathways for non-small cell lung cancer tissues. DOI: doi.org/10.1016/j.bbalip.2021.159082 - Fig. 2A & 2B.

  • D. Wolrab et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. DOI: doi.org/10.1038/s41467-021-27765-9 - Fig. 10A & B.

  • J. Idkowiak et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. DOI: doi.org/10.1007/s00216-022-04490-w - Fig. 5

  • B. Piskláková et al. Rapid and efficient LC-MS/MS diagnosis of inherited metabolic disorders: a semi-automated workflow for analysis of organic acids, acylglycines, and acylcarnitines in urine. DOI: doi.org/10.1515/cclm-2023-0084 - Fig. 4

  • R. Prior, A. Silva, T. Vangansewinkel et al. Defective Schwann cell lipid metabolism alters plasma membrane dynamics in Charcot-Marie-Tooth disease 1A. DOI: doi.org/10.1101/2023.04.02.535224 - Fig. 2D.

  • D. Olešová et al. Changes in lipid metabolism track with the progression of neurofibrillary pathology in tauopathies. DOI: https://doi.org/10.1186/s12974-024-03060-4 - Fig. 3B & F, Fig. 4B & E, Fig. 5I

  • A. Kvasnička et al. Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment. DOI: https://doi.org/10.1186/s13075-023-03204-6 - Fig. 2A and C, Fig. 4A & B, Fig.6A & B.

To prepare lipid maps - we will use our basic PDAC data set. In the first step, we compute fold change and p-value, e.g., from a Welch t-test, and then log2 of fold change and -log10 of the p-value. Log2 of fold change will be used for the continuous mapping of fill color, and -log10 of the p-value for continuous mapping of the size of points (bubbles).

Here is the code:

# Computing Welch t-test & fold change.
# Step 1 - Create 'data.long' tibble:
data.long <- 
  data %>%
  select(-`Sample Name`) %>%
  filter(Label != "PAN") %>%
  pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations")

# Step 2 - Compute Welch's t-test and adjust p-values (rstatix):
Welch.test <- 
  data.long %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label) %>%
  adjust_pvalue(method = "BH")

# Step 3 - Compute mean concentrations for all lipids within healthy controls (N):
Lipids.N <- 
  data.long %>%
  group_by(Lipids, Label) %>%
  summarise(mean = mean(Concentrations)) %>%
  arrange(Label) %>%
  filter(Label == "N")

# Step 4 - Compute mean concentrations for all lipids within patients with PDAC (T):
Lipids.T <- 
  data.long %>%
  group_by(Lipids, Label) %>%
  summarise(mean = mean(Concentrations)) %>%
  arrange(Label) %>%
  filter(Label == "T")

# Step 5 - Compute fold-change and store it in the "Welch.test" as `Fold change`:
Welch.test$`Fold change` <- Lipids.T$mean/Lipids.N$mean

# Step 6 - Using mutate(), compute log2 of fold change:
Welch.test <-
  Welch.test %>%
  mutate(log.2.FC = log2(`Fold change`),
         minlog10.pval = -log10(p))
         
# Step 7 - export the data as a .csv file:
write.csv(Welch.test, "Lipid_map_matrix.csv")

This file can be found in your working directory. Usually, if you have not changed it, it is in the Document file. To check it, run:

# My working directory:
getwd()

Open the file in a spreadsheet program of your choice (e.g., Calc from LibreOffice of Microsoft Excel) and remove all columns except for 'Lipids', 'log.2.FC', 'minlog10.pval'.

Now, we must prepare a skeleton of the lipid map in a separate spreadsheet file. We will create a network in the Cytoscape software using this second file. The file will essentially contain information for the software about how the nodes are supposed to be connected and what values are to be mapped to colors and sizes. Take a look at the figure below:

The network is constructed of nodes (also called vertices) and edges (also called links). In the figure above (on the right side), the nodes are blue rectangles marked with black arrows, while grey lines connecting nodes marked with red arrows are the network's edges. The skeleton of each lipid map is constructed of source nodes. You can adjust their look as you prefer. The fold change and p-value for source nodes are set to NA. In the next step, we will add the lipid species as target nodes.

In the spreadsheet program, we create a table with six columns (shown above on the left side of the figure):

1) The first node name - node_ID - light blue arrow,

2) The first node type - node_type - light green arrow,

3) The second node name - node_ID2 - dark blue arrow,

4) The second node type - node_type2 - dark green arrow,

5) Fold change - FC - red arrow,

6) p-value from the statistical test - p-val - red arrow.

The spreadsheet contains the plan - how the nodes building up the skeleton are supposed to be connected. All lipid classes, i.e., GL (glycerolipids), SP (sphingolipids), ST (sterols), and GPL (glycerophospholipids), are connected to a node called "C" - from the central node of the map. Next, the nodes containing lipid class are connected with nodes containing lipid subclasses, e.g., SP will be connected to Cer (ceramides), HexCer (hexosyl-ceramides), Hex2Cer (di-hexosyl-ceramides), and SM (sphingomyelins), while GL to TG (triglycerides), DG (diglycerides), and MG (monoglycerides). The nodes with lipid subclass will be connected with lipid species (target nodes) in the next step. The target nodes with lipid species measured will additionally have two target node attributes - fold change (FC) and p-value from the statistical test (p-val). These values can be taken from the R .csv output containing results of the Welch test and fold change calculations (columns: 'log.2.FC', 'minlog10.pval'. in the first .csv file). See the example below on how to connect source nodes (lipid subclasses) with target nodes (lipid species) and add target node attributes (FC, p-val):

Connecting source nodes (lipid subclass - red frame) with target nodes (lipid species - blue frame). The violet frame contains the target node attributes - fold change (FC) and p-value from the statistical test (p-val). The values from attribute columns will be mapped to size and color in the next step.

Step by step, we add all lipid species measured to the appropriate subclass of the skeleton. Save the file as .csv. The final .csv file for our exemplary PDAC data set can be downloaded below:

A .csv file containing complete information for the Cytoscape software about the lipid map we want to create.

Now, you need to download and install the Cytoscape software. For producing our lipid map, we used version 3.8.2. It can be downloaded here:

A link to download the Cytoscape software.

Once you open the Cytoscape software, navigate to the "Import network from File System" button:

Cytoscape software interface - "Import network from File System" button.

Select the file containing your lipid map (network) and hit "Open". Then, in the next window, specify the source and target nodes and their attributes:

Setting the node type and node attributes.

The final setting:

The final setting of node types and attributes.

If Cytoscape added empty columns from your .csv file - make sure to remove them (or to avoid issues, copy the network scheme only to a fresh Excel and save it again as .csv). To remove a column, you can select "Not imported":

If Cytoscape finds empty columns, you can remove them by setting "Not imported". However, to avoid potential issues, it is good to copy the network plan to a fresh spreadsheet, save it again as a .csv, and repeat the import to Cytoscape.

After setting node types and attributes, hit "OK". Now, Cytoscape translates the plan into a network scheme. We obtain this raw image:

The raw network obtained in Cytoscape from the .csv file plan.

Let's customize our lipid map. First, you can change the arrangement of the lipid map. Hold the shift button and click the left mouse button, then drag. A rectangle will be developed to label in yellow nodes you would like to move together:

Changing the arrangement of lipid map nodes.

Until the nodes are highlighted in yellow - you can keep dragging them. Click on the white background to remove the yellow selection. The final lipid map arrangement:

The final arrangement of lipid map nodes.

Now, let's change the shape of nodes to circles. To change the plot style, on the left toolbar, select Styles, then navigate to Properties, develop the list, and select Show All:

Modifying lipid map style.

Find Shape, and click on the rectangle symbol. From the window, select Ellipse and hit "Apply":

Changing shape of the lipid map nodes.

Next, find "Height" and click on the gray arrow to develop a list. Two positions will appear:

1) Column - develop the list again and select p-val.

2) Mapping type - develop the list and select Continuous Mapping:

Changing node size. Mapping the height to the p-value: part 1.
Changing node size. Mapping the height to the p-value: part 2.

Now, two fast clicks on "Current Mapping". In the window, unlock the "Node Height" by clicking on the black arrow. Set the value to 100 and hit OK:

Changing node size. Mapping the height to the p-value: part 3.
Changing node size. Mapping the height to the p-value: part 3.

Now, find in Style the position "Width" and repeat the same operations as for height:

Changing node size. Mapping the width to the p-value: 1 - navigate to "Width" in Style, 2 - map width to p-value, 3 - select "Continous Mapping" and set a maximum of 100.

Now, we will map the fold change value to colors. In Style, find the position "Fill color", and develop the list using the gray arrow. Next, in for Column select "FC":

Mapping fold change values to colors - part 1.

As Mapping Type select Continuous Mapping:

Mapping fold change values to colors - part 2.

Make sure that the minimum (negative fold change value) and maximum (positive fold change value) result in the same absolute value to reflect alterations in a balanced way - like in the example below. Minimum and maximum are mapped to -1.59 and 1.59, respectively. If that's not the case, and the minimum would be, e.g. -2.5 and maximum - 4.8 - correct it!

Mapping fold change values to colors - part 3. Balancing the color scale.

If you do not like the default palette - select Current Palette, highlight a new one, and hit OK. Confirm the change of palette by clicking on "Yes".

Mapping fold change values to colors - part 4. Changing the color palette.

Next, we will change the color of the border of nodes to black. Find "Border Paint" in Style. Click on the rectangle, select black color from the palette, and click OK:

Changing the node border color.

Additionally, navigate to "Border Width", click on "0.0" and change the value to "1.0":

Changing the node border width.

We will additionally map the label font size to the p-value. Find "Label Font Size" in Style, develop the list, select as Column: p-val, and as Mapping Type: Continuous Mapping:

Mapping the Label Font size to p-value.

Now, we will modify the lipid map skeleton. Hold Ctrl and select all the remaining blue nodes except for "C". They should be highlighted in yellow. Find again "Shape" in Style and press the third rectangle (by-pass value) from the left. From the list, select "Set Bypass...". Select "Round rectangle":

Changing the shape of skeleton nodes to round rectangle (setting Bypass) - part 1.
Changing the shape of skeleton nodes to round rectangle (setting Bypass) - part 2.

Using "Set Bypass..." we can also change the fill color of skeleton nodes to white:

Changing the fill color of skeleton nodes to white (setting Bypass).

And the Label Font Size to 18:

Changing the Label Font Size of skeleton nodes to 18 (setting Bypass).

We can adjust the central point of the lipid map. Right-click on the node with "C", and a list will develop. Go to "Edit" and next to "Rename Node":

Renaming nodes of lipid map - part 1.

A window appears. Keep it empty and click OK:

Renaming nodes of lipid map - part 1.

We will use the same trick to remove the annotation of all target nodes with lipids of low statistical significance - but later. Now, in Style navigate to "Width" and "Height" and in Bypass set 20 for both. Then find Fill Color in Style and in Bypass select black. The final look:

Partially customized lipid map.

Now, we will clean annotations of all lipids with low statistical significance. We will also rename nodes to keep only the sum of carbons and double bonds and adjust the plot. The final result:

Lipid map prepared using the exemplary PDAC data set in Cytoscape software.

To prepare the legend:

1) Export the ready lipid map from Cytoscape software: go to File, then Export, and Network to Image. Save the image as .svg or .pdf (vector graphics - so you can process it) and a directory to save the network.

2) Open software for processing graphics and import the Cytoscape .svg or .pdf output. If you do not have experience working with such software, you can download and gain skills in using Inkscape for starters:

Inkscape - a freely available software for processing graphics.

3) To create the p-value legend - you can add at the end of your lipid map plan three target nodes to a random lipid subclass, called, e.g., legend 1, legend 2, legend 3, with log2(FC) = 1 and -log10(p-value) of:

  • 1.3, which corresponds to -log10(0.05) - most selected threshold,

  • a middle range of your p-values (in our case, a value around 10),

  • and maximum p-value (in our a value of 18 or 20 if you prefer rounded values in legend).

Creating a legend.

4) After importing the lipid map plan into Cytoscape software, find the legend nodes. Press on the edge connecting them with the lipid subclass source node - it will be highlighted in red once selected. Press "Backspace" on your keyboard to remove the edge connecting the legend target nodes with the source node and move the legend nodes away. Prepare your plot and then export the plot together with the "bubbles" corresponding to the p-value. Arrange them appropriately in the software for processing graphics and color according to your preferences. DO NOT CHANGE THE SIZE OF BUBBLES - it is aligned with the size of nodes in the lipid map!

5) To create the legend for fill colors (fold changes) - in Cytoscape, navigate to Options and then - "Create Legend":

Creating legend for fill colors.

6) Select the directory and save the image with the Cytoscape legend.

7) Next, open the Cytoscape legend in the software for processing graphics, crop unnecessary parts, and keep the fill color scale bar. Add it to your lipid map.

7) Describe the size (p-value range) and colors (fold change range) appropriately. Remember that the fold change is as log2 and the p-value as -log10!

8) Your plot is now ready. Remember to describe all abbreviations used in the lipid maps in the figure caption in your manuscript!

The ggplot2 acyl-chain plots

Download the new data set we selected for you:

The quantitation of lipids in serum samples from patients with PDAC and healthy controls using reversed-phase chromatography (RP) with mass spectrometry detection. The data set was published with the manuscript Lipidomic profiling of human serum enables detection of pancreatic cancer by D. Wolrab et al. DOI: 10.1038/s41467-021-27765-9. The data set was trimmed - only lipid classes annotated at lipid species without repetitions in the composition were kept.

The acyl-chain plots are excellent for presenting lipids annotated at the lipid species level. Below, please find examples of manuscripts using this particular form of data presentation:

Here, except for the p-value from a statistical test, we must compute fold change, log2 of fold change, and -log10 of the p-values. Then, we will need a lipid class, total carbon number (CN), and total double bond number (DB) in separate columns. First, we will again compute statistical test, -log10(p-value), fold change, and log2(fold change):

# Call libraries
library(readxl)
library(tidyverse)
library(rstatix)

# Read new data set in R
data <- read_xlsx(file.choose())  # Select the .xlsx file from directory.

# Check column types:
str(data)

# Change Label to factor column:
data$Label <- as.factor(data$Label)

# Creating 'data.long' tibble:
data.long <-
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `AcylCarnitine 10:0`:`TG 60:3`,
               names_to = "Lipids",
               values_to = "Concentrations")
               
# Computing Welch test:
Welch.test <- 
  data.long %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label) %>%
  adjust_pvalue(method = "BH")
  
# Compute mean concentrations for all lipids within healthy controls (N):
Lipids.N <- 
  data.long %>%
  group_by(Lipids, Label) %>%
  summarise(mean = mean(Concentrations)) %>%
  arrange(Label) %>%
  filter(Label == "N")

# Compute mean concentrations for all lipids within patients with PDAC (T):
Lipids.T <- 
  data.long %>%
  group_by(Lipids, Label) %>%
  summarise(mean = mean(Concentrations)) %>%
  arrange(Label) %>%
  filter(Label == "T")
  
# Compute fold-change and store it in the "Welch.test" as `Fold change`:
Welch.test$`Fold change` <- Lipids.T$mean/Lipids.N$mean

# Using mutate(), compute log2 of fold change:
Welch.test <-
  Welch.test %>%
  mutate(log.2.FC = log2(`Fold change`),
         minlog10.pval = -log10(p))

The statistics are ready. We can move to operations on strings - our lipid shorthand annotations. We need to separate from every lipid annotation:

  • Lipid class, e.g., PC, PC P-, SM, Hex2Cer, etc.,

  • Total carbon number (CN),

  • Total double bond number (DB).

We will use regular expressions for this purpose. To ease building regular expressions (regex), you can use online tools, e.g. regex101.com:

The regex101 - supports bulding, testing, and debugging regular expressions.

Let's try to build a regular expression to extract lipid class names from the shorthand notations. We have three types of shorthand notations for the lipid class:

  • Two letters, e.g., PC, SM, PE, etc.

  • Two letters, white space, single letter, e.g., PC P, PE P, etc.,

  • Letters with digits, e.g., Hex2Cer, GM3, etc.

We put the examples of strings in the regex101:

Examples of strings from which we want to extract specific elements.

You have a quick reference to support building regex in the lower right corner.

1) We will start with the most complicated case: LPC O-/P-, LPE O-/P-, PC O-/P-, PE O-/P-.

2) We certainly want to start extracting letters from the beginning of each string. In regex: ^

2) Then, we can easily extract the letters from the beginning of shorthand notation; in regex - one letter from the range A to Z would be [A-Z], and one or more letters is [A-Z]+. In total, we have: ^[A-Z]+, meaning extract from the beginning of the shorthand notation one or more letters. It covers PC in our example:

Extracting lipid class names from shorthand notations using regular expressions - ether/plasmenyl lipids - PC O- or PC P-, etc.

But it would also cover PE, LPC, LPE, etc.

3) We already have a regex for extracting the first letters. Now, we need to extract <white space> O- or <white space> P-. We can add \s to extract any white space character. Overall we cover in our example PC<white space> now and the regex ^[A-Z]+\s.

Important: To communicate to R that we need to use backslash "\" in the string, we use "\\", so overall: ^[A-Z]+\\s in R, but as a regular string simply: ^[A-Z]+\s:

Extracting lipid class names from shorthand notations using regular expressions - ether/plasmenyl lipids - PC O- or PC P-, etc.

4) Finally, we can extract one more letter (O or P) and the dash (-). As above, one letter from the range between A and Z is [A-Z], and the dash in regex is just -. In total, our regex looks like this: ^[A-Z]+\s[A-Z]-

It means - from the beginning of the string: ^, extract one or more letters from range A to Z: [A-Z]+, white space: \s, one letter from range A to Z: [A-Z], and the dash: -

To execute it in R, we need to change \ into \\, so: ^[A-Z]+\\s[A-Z]-.

Extracting lipid class names from shorthand notations using regular expressions - ether/plasmenyl lipids - PC O- or PC P-, etc.

5) Now, we can extract all lipid class names for ether/plasmenyl lipids from shorthand notations. What about regular lipid class names, e.g., double-letter-based (PC, PE, ...) or combinations of lipids and digits (GM3, Hex2Cer, ...)? We want our regex to work for all types of shorthand lipid notations. We can achieve it by adding the condition OR (or - |) to our regex. In total: ^[A-Z]+\s[A-Z]-|

6) We need to extract characters from the beginning of the string, so we add ^ first. In total: ^[A-Z]+\s[A-Z]-|^

7) We must find a regex that would preferably cover all remaining annotations: PC, PE, GM3, Hex2Cer. The trick here is \S (which means any non-whitespace character). We add plus to have \S+ one or more non-whitespace characters from the beginning of the string.

In total: ^[A-Z]+\s[A-Z]-|^\S+

Or in R: ^[A-Z]+\\s[A-Z]-|^\\S+

Extracting lipid class names from shorthand notations using regular expressions - regex covering all lipid class names.

Using str_extract() from stringr (tidyverse collection), we can now extract our lipid class names from the shorthand notations:

# Extracting lipid class names from the shorthand notations:
Welch.test$Lipid.class <- str_extract(pattern = "^[A-Z]+\\s[A-Z]-|^\\S+", Welch.test$Lipids)

In this line of code, we add a new column to the 'Welch.test' tibble containing lipid class names. The strings with shorthand notations are available in the column Lipids.

We need to extract the total carbon number (CN) and double bond number (DB), too. In the shorthand notations, these values are separated by a colon, e.g., PC 36:2. We can use it to build regex to extract one or more digits (\d+) before and after colon (:). It would be:

Extracting total carbon number (CN) and total double bond number (DB) from shorthand notations using regular expressions.

The R function to extract CN and DB separated by colon is:

# Extract CN and DB in the column named "composition":
Welch.test$composition <- str_extract(pattern = '\\d+[:]\\d+', Welch.test$Lipids)

Now, using cSplit() functions from the splitstackshape package, we can split the composition column into two columns, indicating the colon as a separator. We will name them CN and DB, respectively. Take a look at the code block below:

# Splitting the `composition` column into `CN` and `DB` columns:
# Installing and calling library:
install.packages("splitstackshape")
library(splitstackshape)

# Reading the function documentation:
?cSplit()

# Splitting `composition` column:
Welch.test <- cSplit(Welch.test, splitCols = "composition", sep = ":")

# Changing names of the new columns to `CN` and `DB`:
names(Welch.test)[15] <- "CN"
names(Welch.test)[16] <- "DB"

Our 'Welch.test' tibble is now ready to create the first acyl-chain plot. A quick glimpse:

Glimpse at final 'Welch.test' tibble.

Note: Make sure that CN and DB columns are <int>!

If you find regular expressions too complicated at this step - use spreadsheet software to add the missing columns. Export from R a .csv file with results of hypothesis testing and fold change computations. Open it in spreadsheet software and add columns that you need to create acyl-chain plots, i.e., lipid class, CN, and DB. Read the new data in R and continue to obtain the plot.

Time for the plot code. In fact, it is a simple geom_point()-based ggplot2 chart. The -log10(p-values) are mapped to size and log2(FC) to the fill color, respectively. The x-y axes have a continuous scaling covering the range of double bonds from 0 to 20 (y-axis) and the total number of carbons from 0 to 100 (x-axis). A specific scaling of point areas is applied to avoid overlapping points, and scale_fill_gradientn() is used to fill the points with selected colors stored in the vector 'colors'. To create a separate panel for every lipid class, facet_wrap() across Lipid.class is used with free scales. Take a look at the code:

# Acyl-chain plot code.
# Step - 1: Create a vector with fill colors:
colors <- c("#002060", "#0d78ca", "#00e8f0", "white", "#FF4D4D", "red", "#600000")

# Step - 2: Create plot:
Welch.test %>% 
  ggplot() + 
  geom_point(aes(CN,DB), color = 'black', size = 1, alpha = 0.4) + # Black points indicating measured, but not stat. significant lipids.
  geom_point(aes(CN,DB, size = -log10(p.adj), fill = log.2.FC), shape = 21, color = 'black') + # Stat. significant lipids.
  theme_bw() +
  labs(title = "Acyl-chain plots",
       subtitle = "Alterations in serum lipidome in patients with PDAC",
       x='Number of carbon atoms in side chain',
       y='Saturation')+ 
  scale_y_continuous(breaks=seq(0,20,1)) + 
  scale_x_continuous(breaks=seq(0,100,2)) +
  scale_size_area(breaks = c(1.3,2,3,4,5,6,7,8,9),
                  max_size = 6, 
                  limits = c(-log10(0.05),-log10(min(Welch.test[,'p.adj'])))) +
  scale_fill_gradientn(limits = c(-1.5,1.5), colors = colors) +
  facet_wrap(~ Lipid.class, scales = "free")

The final plot:

Acyl-chain plots presenting alterations in serum lipidome in patients with PDAC compared with healthy controls.

We encourage you to customize the plot according to your preferences!

Last updated