Lipid maps and acyl-chain plots
Metabolites and lipids univariate statistical analysis in R
Last updated
Metabolites and lipids univariate statistical analysis in R
Last updated
Lipid maps and acyl-chain plots are effective alternatives to volcano plots. Both deliver deeper insight into lipidome alterations as more molecules are labeled and grouped by classes in these plots. The easiest way to produce lipid maps is through the Cytoscape software. The acyl-chain plots are generated in R. Here, we will show you how to prepare both of them.
Examples of such lipid maps can be found in the following manuscripts:
E. Cífková et al. Lipidomic and metabolomic analysis reveals changes in biochemical pathways for non-small cell lung cancer tissues. DOI: - Fig. 2A & 2B.
D. Wolrab et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. DOI: - Fig. 10A & B.
J. Idkowiak et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. DOI: - Fig. 5
B. Piskláková et al. Rapid and efficient LC-MS/MS diagnosis of inherited metabolic disorders: a semi-automated workflow for analysis of organic acids, acylglycines, and acylcarnitines in urine. DOI: - Fig. 4
R. Prior, A. Silva, T. Vangansewinkel et al. Defective Schwann cell lipid metabolism alters plasma membrane dynamics in Charcot-Marie-Tooth disease 1A. DOI: - Fig. 2D.
D. Olešová et al. Changes in lipid metabolism track with the progression of neurofibrillary pathology in tauopathies. DOI: - Fig. 3B & F, Fig. 4B & E, Fig. 5I
A. Kvasnička et al. Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment. DOI: - Fig. 2A and C, Fig. 4A & B, Fig.6A & B.
To prepare lipid maps - we will use our basic PDAC data set. In the first step, we compute fold change and p-value, e.g., from a Welch t-test, and then log2 of fold change and -log10 of the p-value. Log2 of fold change will be used for the continuous mapping of fill color, and -log10 of the p-value for continuous mapping of the size of points (bubbles).
Here is the code:
This file can be found in your working directory. Usually, if you have not changed it, it is in the Document file. To check it, run:
Open the file in a spreadsheet program of your choice (e.g., Calc from LibreOffice of Microsoft Excel) and remove all columns except for 'Lipids', 'log.2.FC', 'minlog10.pval'.
Now, we must prepare a skeleton of the lipid map in a separate spreadsheet file. We will create a network in the Cytoscape software using this second file. The file will essentially contain information for the software about how the nodes are supposed to be connected and what values are to be mapped to colors and sizes. Take a look at the figure below:
The network is constructed of nodes (also called vertices) and edges (also called links). In the figure above (on the right side), the nodes are blue rectangles marked with black arrows, while grey lines connecting nodes marked with red arrows are the network's edges. The skeleton of each lipid map is constructed of source nodes. You can adjust their look as you prefer. The fold change and p-value for source nodes are set to NA. In the next step, we will add the lipid species as target nodes.
In the spreadsheet program, we create a table with six columns (shown above on the left side of the figure):
1) The first node name - node_ID - light blue arrow,
2) The first node type - node_type - light green arrow,
3) The second node name - node_ID2 - dark blue arrow,
4) The second node type - node_type2 - dark green arrow,
5) Fold change - FC - red arrow,
6) p-value from the statistical test - p-val - red arrow.
The spreadsheet contains the plan - how the nodes building up the skeleton are supposed to be connected. All lipid classes, i.e., GL (glycerolipids), SP (sphingolipids), ST (sterols), and GPL (glycerophospholipids), are connected to a node called "C" - from the central node of the map. Next, the nodes containing lipid class are connected with nodes containing lipid subclasses, e.g., SP will be connected to Cer (ceramides), HexCer (hexosyl-ceramides), Hex2Cer (di-hexosyl-ceramides), and SM (sphingomyelins), while GL to TG (triglycerides), DG (diglycerides), and MG (monoglycerides). The nodes with lipid subclass will be connected with lipid species (target nodes) in the next step. The target nodes with lipid species measured will additionally have two target node attributes - fold change (FC) and p-value from the statistical test (p-val). These values can be taken from the R .csv output containing results of the Welch test and fold change calculations (columns: 'log.2.FC', 'minlog10.pval'. in the first .csv file). See the example below on how to connect source nodes (lipid subclasses) with target nodes (lipid species) and add target node attributes (FC, p-val):
Step by step, we add all lipid species measured to the appropriate subclass of the skeleton. Save the file as .csv. The final .csv file for our exemplary PDAC data set can be downloaded below:
Now, you need to download and install the Cytoscape software. For producing our lipid map, we used version 3.8.2. It can be downloaded here:
Once you open the Cytoscape software, navigate to the "Import network from File System" button:
Select the file containing your lipid map (network) and hit "Open". Then, in the next window, specify the source and target nodes and their attributes:
The final setting:
If Cytoscape added empty columns from your .csv file - make sure to remove them (or to avoid issues, copy the network scheme only to a fresh Excel and save it again as .csv). To remove a column, you can select "Not imported":
After setting node types and attributes, hit "OK". Now, Cytoscape translates the plan into a network scheme. We obtain this raw image:
Let's customize our lipid map. First, you can change the arrangement of the lipid map. Hold the shift button and click the left mouse button, then drag. A rectangle will be developed to label in yellow nodes you would like to move together:
Until the nodes are highlighted in yellow - you can keep dragging them. Click on the white background to remove the yellow selection. The final lipid map arrangement:
Now, let's change the shape of nodes to circles. To change the plot style, on the left toolbar, select Styles, then navigate to Properties, develop the list, and select Show All:
Find Shape, and click on the rectangle symbol. From the window, select Ellipse and hit "Apply":
Next, find "Height" and click on the gray arrow to develop a list. Two positions will appear:
1) Column - develop the list again and select p-val.
2) Mapping type - develop the list and select Continuous Mapping:
Now, two fast clicks on "Current Mapping". In the window, unlock the "Node Height" by clicking on the black arrow. Set the value to 100 and hit OK:
Now, find in Style the position "Width" and repeat the same operations as for height:
Now, we will map the fold change value to colors. In Style, find the position "Fill color", and develop the list using the gray arrow. Next, in for Column select "FC":
As Mapping Type select Continuous Mapping:
Make sure that the minimum (negative fold change value) and maximum (positive fold change value) result in the same absolute value to reflect alterations in a balanced way - like in the example below. Minimum and maximum are mapped to -1.59 and 1.59, respectively. If that's not the case, and the minimum would be, e.g. -2.5 and maximum - 4.8 - correct it!
If you do not like the default palette - select Current Palette, highlight a new one, and hit OK. Confirm the change of palette by clicking on "Yes".
Next, we will change the color of the border of nodes to black. Find "Border Paint" in Style. Click on the rectangle, select black color from the palette, and click OK:
Additionally, navigate to "Border Width", click on "0.0" and change the value to "1.0":
We will additionally map the label font size to the p-value. Find "Label Font Size" in Style, develop the list, select as Column: p-val, and as Mapping Type: Continuous Mapping:
Now, we will modify the lipid map skeleton. Hold Ctrl and select all the remaining blue nodes except for "C". They should be highlighted in yellow. Find again "Shape" in Style and press the third rectangle (by-pass value) from the left. From the list, select "Set Bypass...". Select "Round rectangle":
Using "Set Bypass..." we can also change the fill color of skeleton nodes to white:
And the Label Font Size to 18:
We can adjust the central point of the lipid map. Right-click on the node with "C", and a list will develop. Go to "Edit" and next to "Rename Node":
A window appears. Keep it empty and click OK:
We will use the same trick to remove the annotation of all target nodes with lipids of low statistical significance - but later. Now, in Style navigate to "Width" and "Height" and in Bypass set 20 for both. Then find Fill Color in Style and in Bypass select black. The final look:
Now, we will clean annotations of all lipids with low statistical significance. We will also rename nodes to keep only the sum of carbons and double bonds and adjust the plot. The final result:
To prepare the legend:
1) Export the ready lipid map from Cytoscape software: go to File, then Export, and Network to Image. Save the image as .svg or .pdf (vector graphics - so you can process it) and a directory to save the network.
2) Open software for processing graphics and import the Cytoscape .svg or .pdf output. If you do not have experience working with such software, you can download and gain skills in using Inkscape for starters:
3) To create the p-value legend - you can add at the end of your lipid map plan three target nodes to a random lipid subclass, called, e.g., legend 1, legend 2, legend 3, with log2(FC) = 1 and -log10(p-value) of:
1.3, which corresponds to -log10(0.05) - most selected threshold,
a middle range of your p-values (in our case, a value around 10),
and maximum p-value (in our a value of 18 or 20 if you prefer rounded values in legend).
4) After importing the lipid map plan into Cytoscape software, find the legend nodes. Press on the edge connecting them with the lipid subclass source node - it will be highlighted in red once selected. Press "Backspace" on your keyboard to remove the edge connecting the legend target nodes with the source node and move the legend nodes away. Prepare your plot and then export the plot together with the "bubbles" corresponding to the p-value. Arrange them appropriately in the software for processing graphics and color according to your preferences. DO NOT CHANGE THE SIZE OF BUBBLES - it is aligned with the size of nodes in the lipid map!
5) To create the legend for fill colors (fold changes) - in Cytoscape, navigate to Options and then - "Create Legend":
6) Select the directory and save the image with the Cytoscape legend.
7) Next, open the Cytoscape legend in the software for processing graphics, crop unnecessary parts, and keep the fill color scale bar. Add it to your lipid map.
7) Describe the size (p-value range) and colors (fold change range) appropriately. Remember that the fold change is as log2 and the p-value as -log10!
8) Your plot is now ready. Remember to describe all abbreviations used in the lipid maps in the figure caption in your manuscript!
Download the new data set we selected for you:
The acyl-chain plots are excellent for presenting lipids annotated at the lipid species level. Below, please find examples of manuscripts using this particular form of data presentation:
Here, except for the p-value from a statistical test, we must compute fold change, log2 of fold change, and -log10 of the p-values. Then, we will need a lipid class, total carbon number (CN), and total double bond number (DB) in separate columns. First, we will again compute statistical test, -log10(p-value), fold change, and log2(fold change):
The statistics are ready. We can move to operations on strings - our lipid shorthand annotations. We need to separate from every lipid annotation:
Lipid class, e.g., PC, PC P-, SM, Hex2Cer, etc.,
Total carbon number (CN),
Total double bond number (DB).
We will use regular expressions for this purpose. To ease building regular expressions (regex), you can use online tools, e.g. regex101.com:
Let's try to build a regular expression to extract lipid class names from the shorthand notations. We have three types of shorthand notations for the lipid class:
Two letters, e.g., PC, SM, PE, etc.
Two letters, white space, single letter, e.g., PC P, PE P, etc.,
Letters with digits, e.g., Hex2Cer, GM3, etc.
We put the examples of strings in the regex101:
You have a quick reference to support building regex in the lower right corner.
1) We will start with the most complicated case: LPC O-/P-, LPE O-/P-, PC O-/P-, PE O-/P-.
2) We certainly want to start extracting letters from the beginning of each string. In regex: ^
2) Then, we can easily extract the letters from the beginning of shorthand notation; in regex - one letter from the range A to Z would be [A-Z], and one or more letters is [A-Z]+. In total, we have: ^[A-Z]+, meaning extract from the beginning of the shorthand notation one or more letters. It covers PC in our example:
But it would also cover PE, LPC, LPE, etc.
3) We already have a regex for extracting the first letters. Now, we need to extract <white space> O- or <white space> P-. We can add \s to extract any white space character. Overall we cover in our example PC<white space> now and the regex ^[A-Z]+\s.
Important: To communicate to R that we need to use backslash "\" in the string, we use "\\", so overall: ^[A-Z]+\\s in R, but as a regular string simply: ^[A-Z]+\s:
4) Finally, we can extract one more letter (O or P) and the dash (-). As above, one letter from the range between A and Z is [A-Z], and the dash in regex is just -. In total, our regex looks like this: ^[A-Z]+\s[A-Z]-
It means - from the beginning of the string: ^, extract one or more letters from range A to Z: [A-Z]+, white space: \s, one letter from range A to Z: [A-Z], and the dash: -
To execute it in R, we need to change \ into \\, so: ^[A-Z]+\\s[A-Z]-.
5) Now, we can extract all lipid class names for ether/plasmenyl lipids from shorthand notations. What about regular lipid class names, e.g., double-letter-based (PC, PE, ...) or combinations of lipids and digits (GM3, Hex2Cer, ...)? We want our regex to work for all types of shorthand lipid notations. We can achieve it by adding the condition OR (or - |) to our regex. In total: ^[A-Z]+\s[A-Z]-|
6) We need to extract characters from the beginning of the string, so we add ^ first. In total: ^[A-Z]+\s[A-Z]-|^
7) We must find a regex that would preferably cover all remaining annotations: PC, PE, GM3, Hex2Cer. The trick here is \S (which means any non-whitespace character). We add plus to have \S+ one or more non-whitespace characters from the beginning of the string.
In total: ^[A-Z]+\s[A-Z]-|^\S+
Or in R: ^[A-Z]+\\s[A-Z]-|^\\S+
Using str_extract() from stringr (tidyverse collection), we can now extract our lipid class names from the shorthand notations:
In this line of code, we add a new column to the 'Welch.test' tibble containing lipid class names. The strings with shorthand notations are available in the column Lipids
.
We need to extract the total carbon number (CN) and double bond number (DB), too. In the shorthand notations, these values are separated by a colon, e.g., PC 36:2. We can use it to build regex to extract one or more digits (\d+) before and after colon (:). It would be:
The R function to extract CN and DB separated by colon is:
Now, using cSplit() functions from the splitstackshape package, we can split the composition
column into two columns, indicating the colon as a separator. We will name them CN and DB, respectively. Take a look at the code block below:
Our 'Welch.test' tibble is now ready to create the first acyl-chain plot. A quick glimpse:
Note: Make sure that CN
and DB
columns are <int>!
If you find regular expressions too complicated at this step - use spreadsheet software to add the missing columns. Export from R a .csv file with results of hypothesis testing and fold change computations. Open it in spreadsheet software and add columns that you need to create acyl-chain plots, i.e., lipid class, CN, and DB. Read the new data in R and continue to obtain the plot.
Time for the plot code. In fact, it is a simple geom_point()-based ggplot2 chart. The -log10(p-values) are mapped to size and log2(FC) to the fill color, respectively. The x-y axes have a continuous scaling covering the range of double bonds from 0 to 20 (y-axis) and the total number of carbons from 0 to 100 (x-axis). A specific scaling of point areas is applied to avoid overlapping points, and scale_fill_gradientn() is used to fill the points with selected colors stored in the vector 'colors'. To create a separate panel for every lipid class, facet_wrap() across Lipid.class
is used with free scales. Take a look at the code:
The final plot:
We encourage you to customize the plot according to your preferences!
J. Dusek et al. The hypolipidemic effect of MI-883, the combined CAR agonist/ PXR antagonist, in diet-induced hypercholesterolemia model. DOI: - Fig. 5j, k and l.
A. Kvasnička et al. Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment. DOI: - Fig. 3.