# Lipid maps and acyl-chain plots

Lipid maps and acyl-chain plots are effective alternatives to volcano plots. Both deliver deeper insight into lipidome alterations as more molecules are labeled and grouped by classes in these plots. The easiest way to produce lipid maps is through the Cytoscape software. The acyl-chain plots are generated in R. Here, we will show you how to prepare both of them.

## Lipid maps from the Cytoscape software (R computations)

Examples of such lipid maps can be found in the following manuscripts:

* E. Cífková et al. Lipidomic and metabolomic analysis reveals changes in biochemical pathways for non-small cell lung cancer tissues. DOI: [doi.org/10.1016/j.bbalip.2021.159082](https://doi.org/10.1016/j.bbalip.2021.159082) - Fig. 2A & 2B.
* D. Wolrab et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. DOI: [doi.org/10.1038/s41467-021-27765-9](https://www.nature.com/articles/s41467-021-27765-9) - Fig. 10A & B.
* J. Idkowiak et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. DOI: [doi.org/10.1007/s00216-022-04490-w](https://link.springer.com/article/10.1007/s00216-022-04490-w) - Fig. 5&#x20;
* B. Piskláková et al. Rapid and efficient LC-MS/MS diagnosis of inherited metabolic disorders: a semi-automated workflow for analysis of organic acids, acylglycines, and acylcarnitines in urine. DOI: [doi.org/10.1515/cclm-2023-0084](https://doi.org/10.1515/cclm-2023-0084) - Fig. 4
* [ ](http://orcid.org/0000-0002-4184-0693)R. Prior, A. Silva, T. Vangansewinkel et al. Defective Schwann cell lipid metabolism alters plasma membrane dynamics in Charcot-Marie-Tooth disease 1A. DOI: [doi.org/10.1101/2023.04.02.535224](https://www.biorxiv.org/content/10.1101/2023.04.02.535224v1) - Fig. 2D.&#x20;
* D. Olešová et al. Changes in lipid metabolism track with the progression of neurofibrillary pathology in tauopathies. DOI: <https://doi.org/10.1186/s12974-024-03060-4> - Fig. 3B & F, Fig. 4B & E, Fig. 5I
* A. Kvasnička et al. Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment. DOI: [https://doi.org/10.1186/s13075-023-03204-6 ](<https://doi.org/10.1186/s13075-023-03204-6 >)- Fig. 2A and C, Fig. 4A & B, Fig.6A & B.

To prepare lipid maps - we will use our basic PDAC data set. In the first step, we compute fold change and p-value, e.g., from a Welch *t*-test, and then log2 of fold change and -log10 of the p-value. Log2 of fold change will be used for the continuous mapping of fill color, and -log10 of the p-value for continuous mapping of the size of points (bubbles).&#x20;

Here is the code:

```r
# Computing Welch t-test & fold change.
# Step 1 - Create 'data.long' tibble:
data.long <- 
  data %>%
  select(-`Sample Name`) %>%
  filter(Label != "PAN") %>%
  pivot_longer(cols = `CE 16:1`:`SM 42:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations")

# Step 2 - Compute Welch's t-test and adjust p-values (rstatix):
Welch.test <- 
  data.long %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label) %>%
  adjust_pvalue(method = "BH")

# Step 3 - Compute mean concentrations for all lipids within healthy controls (N):
Lipids.N <- 
  data.long %>%
  group_by(Lipids, Label) %>%
  summarise(mean = mean(Concentrations)) %>%
  arrange(Label) %>%
  filter(Label == "N")

# Step 4 - Compute mean concentrations for all lipids within patients with PDAC (T):
Lipids.T <- 
  data.long %>%
  group_by(Lipids, Label) %>%
  summarise(mean = mean(Concentrations)) %>%
  arrange(Label) %>%
  filter(Label == "T")

# Step 5 - Compute fold-change and store it in the "Welch.test" as `Fold change`:
Welch.test$`Fold change` <- Lipids.T$mean/Lipids.N$mean

# Step 6 - Using mutate(), compute log2 of fold change:
Welch.test <-
  Welch.test %>%
  mutate(log.2.FC = log2(`Fold change`),
         minlog10.pval = -log10(p))
         
# Step 7 - export the data as a .csv file:
write.csv(Welch.test, "Lipid_map_matrix.csv")
```

This file can be found in your working directory. Usually, if you have not changed it, it is in the Document file. To check it, run:

```r
# My working directory:
getwd()
```

Open the file in a spreadsheet program of your choice (e.g., Calc from LibreOffice of Microsoft Excel) and remove all columns except for 'Lipids', 'log.2.FC', 'minlog10.pval'.&#x20;

Now, we must prepare a skeleton of the lipid map in a separate spreadsheet file. We will create a network in the Cytoscape software using this second file. The file will essentially contain information for the software about how the nodes are supposed to be connected and what values are to be mapped to colors and sizes. Take a look at the figure below:

<figure><img src="/files/6hZKgri8GYUWwt5SUiBf" alt=""><figcaption></figcaption></figure>

The network is constructed of **nodes (also called vertices)** and **edges (also called links)**. In the figure above (on the right side), the nodes are blue rectangles marked with black arrows, while grey lines connecting nodes marked with red arrows are the network's edges. The **skeleton of each lipid map** is constructed of **source nodes**. You can adjust their look as you prefer. The fold change and p-value for source nodes are set to NA. In the next step, we will add the lipid species as target nodes.

In the spreadsheet program, we create a table with six columns (shown above on the left side of the figure):

1\) The first node name - node\_ID - light blue arrow,

2\) The first node type - node\_type - light green arrow,

3\) The second node name - node\_ID2 - dark blue arrow,

4\) The second node type - node\_type2 - dark green arrow,

5\) Fold change - FC - red arrow,

6\)  p-value from the statistical test - p-val - red arrow.&#x20;

The spreadsheet contains the plan - how the nodes building up the skeleton are supposed to be connected. All lipid classes, i.e., GL (**gl**ycerolipids), SP (**sp**hingolipids), ST (**st**erols), and GPL (**g**lycero**p**hospho**l**ipids), are connected to a node called "C" - from the **c**entral node of the map. Next, the nodes containing lipid class are connected with nodes containing lipid subclasses, e.g., SP will be connected to Cer (ceramides), HexCer (hexosyl-ceramides), Hex2Cer (di-hexosyl-ceramides), and SM (sphingomyelins), while GL to TG (triglycerides), DG (diglycerides), and MG (monoglycerides). The nodes with lipid subclass will be connected with lipid species (target nodes) in the next step. The target nodes with lipid species measured will additionally have two target node attributes - fold change (FC) and p-value from the statistical test (p-val). These values can be taken from the R .csv output containing results of the Welch test and fold change calculations (columns: 'log.2.FC', 'minlog10.pval'. in the first .csv file). See the example below on how to connect source nodes (lipid subclasses) with target nodes (lipid species) and add target node attributes (FC, p-val):

<figure><img src="/files/LhdoTuqxXc71nmcSIN8x" alt=""><figcaption><p>Connecting source nodes (lipid subclass - red frame) with target nodes (lipid species - blue frame). The violet frame contains the target node attributes - fold change (FC) and p-value from the statistical test (p-val). The values from attribute columns will be mapped to size and color in the next step. </p></figcaption></figure>

Step by step, we add all lipid species measured to the appropriate subclass of the skeleton. Save the file as .csv. The final .csv file for our exemplary PDAC data set can be downloaded below:

{% file src="/files/LX6couf4zdPt3Ot3uWHN" %}
A .csv file containing complete information for the Cytoscape software about the lipid map we want to create.
{% endfile %}

Now, you need to download and install the Cytoscape software. For producing our lipid map, we used version 3.8.2. It can be downloaded here:

{% embed url="<https://cytoscape.org/download_old_versions.html>" %}
A link to download the Cytoscape software.
{% endembed %}

Once you open the Cytoscape software, navigate to the **"Import network from File System"** button:

<figure><img src="/files/37w4EmRoK6I6mSS8EIUk" alt=""><figcaption><p>Cytoscape software interface - <strong>"Import network from File System"</strong> button.</p></figcaption></figure>

Select the file containing your lipid map (network) and hit "Open". Then, in the next window, specify the source and target nodes and their attributes:

<figure><img src="/files/vXsHumvwT3wWBXR93yq7" alt=""><figcaption><p>Setting the node type and node attributes.</p></figcaption></figure>

The final setting:

<figure><img src="/files/HScwSiMZpqewlVLtN4pv" alt=""><figcaption><p>The final setting of node types and attributes.</p></figcaption></figure>

If Cytoscape added empty columns from your .csv file - **make sure to remove them (or to avoid issues, copy the network scheme&#x20;*****only*****&#x20;to a fresh Excel and save it again as .csv). To remove a column, you can select "Not imported":**

<figure><img src="/files/djrvJfB9gunzCSzlUaEB" alt=""><figcaption><p>If Cytoscape finds empty columns, you can remove them by setting "Not imported". However, to avoid potential issues, it is good to copy the network plan to a fresh spreadsheet, save it again as a .csv, and repeat the import to Cytoscape.</p></figcaption></figure>

After setting node types and attributes, hit "OK". Now, Cytoscape translates the plan into a network scheme. We obtain this raw image:

<figure><img src="/files/m30jmgcvfg5lLKFuPrho" alt=""><figcaption><p>The raw network obtained in Cytoscape from the .csv file plan.</p></figcaption></figure>

Let's customize our lipid map. First, you can change the arrangement of the lipid map. Hold the shift button and click the left mouse button, then drag. A rectangle will be developed to label in yellow nodes you would like to move together:

<figure><img src="/files/xMfOh3vf9hRdJTnStZ2t" alt=""><figcaption><p>Changing the arrangement of lipid map nodes.</p></figcaption></figure>

Until the nodes are highlighted in yellow - you can keep dragging them. Click on the white background to remove the yellow selection. The final lipid map arrangement:

<figure><img src="/files/SqcRbFWypmBnDV2zJDQg" alt=""><figcaption><p>The final arrangement of lipid map nodes.</p></figcaption></figure>

Now, let's change the shape of nodes to circles. To change the plot style, on the left toolbar, select **Styles**, then navigate to **Properties**, develop the list, and select **Show All:**

<figure><img src="/files/XYTJVhwsrWLOaU6lEAfZ" alt=""><figcaption><p>Modifying lipid map style.</p></figcaption></figure>

Find **Shape**, and **click on the rectangle symbol**. From the window, select **Ellipse** and hit **"Apply"**:

<figure><img src="/files/WoBT8erBZDzqsAzZYzEt" alt=""><figcaption><p>Changing shape of the lipid map nodes.</p></figcaption></figure>

Next, find **"Height"** and **click on the gray arrow** to develop a list. Two positions will appear:

1\) **Column** - develop the list again and select **p-val**.

2\) **Mapping type** - develop the list and select **Continuous Mapping:**

<figure><img src="/files/PeFsANzOGf2IuohxOcX0" alt=""><figcaption><p>Changing node size. Mapping the height to the p-value: part 1.</p></figcaption></figure>

<figure><img src="/files/2UxhKpI1lglqy5agRQy8" alt=""><figcaption><p>Changing node size. Mapping the height to the p-value: part 2.</p></figcaption></figure>

Now, two fast clicks on **"Current Mapping"**. In the window, unlock the **"Node Height"** by clicking on the **black arrow**. Set the value to **100** and hit **OK**:

<figure><img src="/files/snYM2LWLfLofykgJM1Ja" alt=""><figcaption><p>Changing node size. Mapping the height to the p-value: part 3.</p></figcaption></figure>

<figure><img src="/files/Xex5srj158TmIITP23Bw" alt=""><figcaption><p>Changing node size. Mapping the height to the p-value: part 3.</p></figcaption></figure>

Now, find in **Style** the position **"Width"** and **repeat the same operations as for height:**

<figure><img src="/files/cqLLhMyAEDOHKuCDeFUp" alt=""><figcaption><p>Changing node size. Mapping the width to the p-value: 1 - navigate to "Width" in Style, 2 - map width to p-value, 3 - select "Continous Mapping" and set a maximum of 100.</p></figcaption></figure>

Now, we will map the fold change value to colors. In **Style**, find the position **"Fill color"**, and develop the list using the **gray arrow. Next, in for Column select "FC":**

<figure><img src="/files/JrRZxlrwfnlb3YGpIzlN" alt=""><figcaption><p>Mapping fold change values to colors - part 1.</p></figcaption></figure>

As **Mapping Type** select **Continuous Mapping**:

<figure><img src="/files/jumE8m7ulGgkxmm3C92J" alt=""><figcaption><p>Mapping fold change values to colors - part 2.</p></figcaption></figure>

Make sure that the minimum (negative fold change value) and maximum (positive fold change value) result in the same absolute value to reflect alterations in a balanced way - like in the example below. Minimum and maximum are mapped to -1.59 and 1.59, respectively. **If that's not the case, and the minimum would be, e.g. -2.5 and maximum - 4.8 - correct it!**&#x20;

<figure><img src="/files/QIkcci0VxQ8KUyKJGaZM" alt=""><figcaption><p>Mapping fold change values to colors - part 3. Balancing the color scale.</p></figcaption></figure>

If you do not like the default palette - select **Current Palette**, **highlight a new one**, and **hit OK. Confirm the change of palette by clicking on "Yes".**

<figure><img src="/files/Ei4pXu2qvJ9xaAyRoQQY" alt=""><figcaption><p>Mapping fold change values to colors - part 4. Changing the color palette.</p></figcaption></figure>

Next, we will change the color of the border of nodes to black. Find **"Border Paint"** in Style. **Click on the rectangle, select black color from the palette, and click OK:**

<figure><img src="/files/2DeugssuFwpkOG0q1tfn" alt=""><figcaption><p>Changing the node border color. </p></figcaption></figure>

Additionally, navigate to **"Border Width"**, click on **"0.0"** and change the value to **"1.0"**:

<figure><img src="/files/ecP82A6FpM8cl34PutjM" alt=""><figcaption><p>Changing the node border width. </p></figcaption></figure>

We will additionally map the label font size to the p-value. Find "**Label Font Size"** in **Style**, develop the list, select as **Column: p-val**, and as **Mapping Type: Continuous Mapping**:

<figure><img src="/files/JJyVsrE2sltRVCy2U1zR" alt=""><figcaption><p>Mapping the Label Font size to p-value.</p></figcaption></figure>

Now, we will modify the lipid map skeleton. **Hold Ctrl and select all the remaining blue nodes except for "C". They should be highlighted in yellow.** Find again "**Shape"** in Style and press the third rectangle (by-pass value) from the left. From the list, select **"Set Bypass..."**. Select **"Round rectangle"**:

<figure><img src="/files/IsWDmwVwOjF3ZBt8uj5x" alt=""><figcaption><p>Changing the shape of skeleton nodes to round rectangle (setting Bypass) - part 1.</p></figcaption></figure>

<figure><img src="/files/RmUPwPT4fUieV47o9hMF" alt=""><figcaption><p>Changing the shape of skeleton nodes to round rectangle (setting Bypass) - part 2.</p></figcaption></figure>

Using **"Set Bypass..."** we can also change the fill color of skeleton nodes to white:

<figure><img src="/files/706hVrzR7UnzalxgKvzF" alt=""><figcaption><p>Changing the fill color of skeleton nodes to white (setting Bypass).</p></figcaption></figure>

And the **Label Font Size** to **18**:

<figure><img src="/files/Qw1m4ttUhXTmUEWC6PmP" alt=""><figcaption><p>Changing the Label Font Size of skeleton nodes to 18 (setting Bypass).</p></figcaption></figure>

We can adjust the central point of the lipid map. **Right-click on the node with "C"**, and a list will develop. Go to **"Edit"** and next to **"Rename Node"**:

<figure><img src="/files/Fpr5kjIXCqc8DNgMuakF" alt=""><figcaption><p>Renaming nodes of lipid map - part 1.</p></figcaption></figure>

A window appears. Keep it empty and click OK:

<figure><img src="/files/r0XUk5UzWkfrWwP5ifo2" alt=""><figcaption><p>Renaming nodes of lipid map - part 1.</p></figcaption></figure>

**We will use the same trick to remove the annotation of all target nodes with lipids of low statistical significance** - but later. Now, in Style navigate to **"Width"** and **"Height"** and in **Bypass set 20 for both**. Then find **Fill Color** in Style and **in Bypass** select **black.** The final look:

<figure><img src="/files/FPtiEpcdnOsNzsnlic90" alt=""><figcaption><p>Partially customized lipid map.</p></figcaption></figure>

Now, we will clean annotations of all lipids with low statistical significance. We will also rename nodes to keep only the sum of carbons and double bonds and adjust the plot. The final result:

<figure><img src="/files/tbzPACye3KISxyQpudMN" alt=""><figcaption><p>Lipid map prepared using the exemplary PDAC data set in Cytoscape software.</p></figcaption></figure>

To prepare the legend:

1\) Export the ready lipid map from Cytoscape software: go to **File**, then **Export**, and **Network to Image**. Save the image as .svg or .pdf (vector graphics - so you can process it) and a directory to save the network.&#x20;

2\) Open software for processing graphics and import the Cytoscape .svg or .pdf output. If you do not have experience working with such software, you can download and gain skills in using Inkscape for starters:

{% embed url="<https://inkscape.org/>" %}
Inkscape - a freely available software for processing graphics.
{% endembed %}

3\) To create the p-value legend - you can add at the end of your lipid map plan three target nodes to \
a random lipid subclass, called, e.g., legend 1, legend 2, legend 3, with log2(FC) = 1 and -log10(p-value) of:&#x20;

* 1.3, which corresponds to -log10(0.05) - most selected threshold,&#x20;
* a middle range of your p-values (in our case, a value around 10),&#x20;
* and maximum p-value (in our a value of 18 or 20 if you prefer rounded values in legend).

<figure><img src="/files/1nHjfcPQo4nRk0ckk6gx" alt=""><figcaption><p>Creating a legend.</p></figcaption></figure>

4\) After importing the lipid map plan into Cytoscape software, find the legend nodes. Press on the edge connecting them with the lipid subclass source node - it will be highlighted in red once selected. **Press "Backspace" on your keyboard to remove the edge connecting the legend target nodes with the source node** and move the legend nodes away. Prepare your plot and then export the plot together with the "bubbles" corresponding to the p-value. Arrange them appropriately in the software for processing graphics and color according to your preferences. **DO NOT CHANGE THE SIZE OF BUBBLES - it is aligned with the size of nodes in the lipid map!**

5\) To create the legend for fill colors (fold changes) - in Cytoscape, navigate to **Options** and then - **"Create Legend"**:&#x20;

<figure><img src="/files/xgNi712lEBjUgtnBYn4I" alt=""><figcaption><p>Creating legend for fill colors.</p></figcaption></figure>

6\) Select the directory and save the image with the Cytoscape legend.&#x20;

7\) Next, open the Cytoscape legend in the software for processing graphics, crop unnecessary parts, and keep the fill color scale bar. Add it to your lipid map.

7\) Describe the size (p-value range) and colors (fold change range) appropriately. Remember that the fold change is as log2 and the p-value as -log10!

8\) Your plot is now ready. Remember to describe all abbreviations used in the lipid maps in the figure caption in your manuscript!

## The ggplot2 acyl-chain plots

Download the new data set we selected for you:

{% file src="/files/bDqmvZR2ETzh4NXpt5X3" %}
The quantitation of lipids in serum samples from patients with PDAC and healthy controls using reversed-phase chromatography (RP) with mass spectrometry detection. The data set was published with the manuscript *Lipidomic profiling of human serum enables detection of pancreatic cancer* by D. Wolrab et al. DOI: 10.1038/s41467-021-27765-9. The data set was trimmed - only lipid classes annotated at lipid species without repetitions in the composition were kept.
{% endfile %}

The acyl-chain plots are excellent for presenting lipids annotated at the lipid species level. Below, please find examples of manuscripts using this particular form of data presentation:

* J. Dusek et al. The hypolipidemic effect of MI-883, the combined CAR agonist/ PXR antagonist, in diet-induced hypercholesterolemia model. DOI: <https://doi.org/10.1038/s41467-025-56642-y> - Fig. 5j, k and l.&#x20;
* A. Kvasnička et al. Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment. DOI: [https://doi.org/10.1186/s13075-023-03204-6 ](<https://doi.org/10.1186/s13075-023-03204-6 >)- Fig. 3.

Here, except for the p-value from a statistical test, we must compute fold change, log2 of fold change, and -log10 of the p-values. Then, we will need a lipid class, total carbon number (CN), and total double bond number (DB) in separate columns. First, we will again compute statistical test, -log10(p-value), fold change, and log2(fold change):

```r
# Call libraries
library(readxl)
library(tidyverse)
library(rstatix)

# Read new data set in R
data <- read_xlsx(file.choose())  # Select the .xlsx file from directory.

# Check column types:
str(data)

# Change Label to factor column:
data$Label <- as.factor(data$Label)

# Creating 'data.long' tibble:
data.long <-
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `AcylCarnitine 10:0`:`TG 60:3`,
               names_to = "Lipids",
               values_to = "Concentrations")
               
# Computing Welch test:
Welch.test <- 
  data.long %>%
  group_by(Lipids) %>%
  t_test(Concentrations ~ Label) %>%
  adjust_pvalue(method = "BH")
  
# Compute mean concentrations for all lipids within healthy controls (N):
Lipids.N <- 
  data.long %>%
  group_by(Lipids, Label) %>%
  summarise(mean = mean(Concentrations)) %>%
  arrange(Label) %>%
  filter(Label == "N")

# Compute mean concentrations for all lipids within patients with PDAC (T):
Lipids.T <- 
  data.long %>%
  group_by(Lipids, Label) %>%
  summarise(mean = mean(Concentrations)) %>%
  arrange(Label) %>%
  filter(Label == "T")
  
# Compute fold-change and store it in the "Welch.test" as `Fold change`:
Welch.test$`Fold change` <- Lipids.T$mean/Lipids.N$mean

# Using mutate(), compute log2 of fold change:
Welch.test <-
  Welch.test %>%
  mutate(log.2.FC = log2(`Fold change`),
         minlog10.pval = -log10(p))
```

The statistics are ready. We can move to operations on strings - our lipid shorthand annotations. We need to separate from every lipid annotation:&#x20;

* Lipid class, e.g., PC, PC P-, SM, Hex2Cer, etc.,
* Total carbon number (CN),
* Total double bond number (DB).

We will use regular expressions for this purpose. To ease building regular expressions (regex), you can use online tools, e.g. regex101.com:

{% embed url="<https://regex101.com/>" %}
The regex101 - supports bulding, testing, and debugging regular expressions.&#x20;
{% endembed %}

Let's try to build a regular expression to extract lipid class names from the shorthand notations. We have three types of shorthand notations for the lipid class:

* Two letters, e.g., PC, SM, PE, etc.
* Two letters, white space, single letter, e.g., PC P, PE P, etc.,
* Letters with digits, e.g., Hex2Cer, GM3, etc.

We put the examples of strings in the regex101:

<figure><img src="/files/JFwDYqpeq0YQfRO7fEY0" alt=""><figcaption><p>Examples of strings from which we want to extract specific elements.</p></figcaption></figure>

You have a quick reference to support building regex in the lower right corner.&#x20;

1\) We will start with the most complicated case: **LPC O-/P-, LPE O-/P-,** **PC O-/P-, PE O-/P-.**&#x20;

2\) We certainly want to start extracting letters from the beginning of each string. In regex: **^**

2\) Then, we can easily extract the letters from the beginning of shorthand notation; in regex - one letter from the range A to Z would be **\[A-Z]**, and one or more letters is **\[A-Z]+. In total,** **we have: ^\[A-Z]+, meaning extract from the beginning of the shorthand notation one or more letters. It covers PC in our example:**

<figure><img src="/files/5JSBhwQMvv3F1uiuw8DT" alt=""><figcaption><p>Extracting lipid class names from shorthand notations using regular expressions - ether/plasmenyl lipids - PC O- or PC P-, etc.</p></figcaption></figure>

But it would also cover PE, LPC, LPE, etc.

3\) We already have a regex for extracting the first letters. Now, we need to extract \<white space> O- or \<white space> P-. We can add **\s** to extract any white space character. Overall we cover in our example **PC\<white space>** now and the regex **^\[A-Z]+\s.**&#x20;

**Important:** To communicate to R that we need to use backslash "\\" in the string, we use "\\\\", so overall: **^\[A-Z]+\\\s** in R, but as a regular string simply: **^\[A-Z]+\s:**

<figure><img src="/files/5Dx1nO0xYKBUQ58UGR5F" alt=""><figcaption><p>Extracting lipid class names from shorthand notations using regular expressions - ether/plasmenyl lipids - PC O- or PC P-, etc.</p></figcaption></figure>

4\) Finally, we can extract one more letter (O or P) and the dash (-). As above, one letter from the range between A and Z is **\[A-Z],** and the dash in regex is just -**.** \
**In total, our regex looks like this: ^\[A-Z]+\s\[A-Z]-**&#x20;

**It means - from the beginning of the string: ^, extract one or more letters from range A to Z: \[A-Z]+, white space: \s, one letter from range A to Z: \[A-Z], and the dash: -**

**To execute it in R, we need to change \ into \\\\, so:  ^\[A-Z]+\\\s\[A-Z]-.**

<figure><img src="/files/WMlyfC43fZ3aZBzrI23F" alt=""><figcaption><p>Extracting lipid class names from shorthand notations using regular expressions - ether/plasmenyl lipids - PC O- or PC P-, etc.</p></figcaption></figure>

5\) Now, we can extract all lipid class names for ether/plasmenyl lipids from shorthand notations. What about regular lipid class names, e.g., double-letter-based (PC, PE, ...) or combinations of lipids and digits (GM3, Hex2Cer, ...)? We want our regex to work for all types of shorthand lipid notations. We can achieve it by adding the condition **OR (or - |)** to our regex. **In total: ^\[A-Z]+\s\[A-Z]-|**&#x20;

6\) We need to extract characters from the beginning of the string, so we add **^** first. \
**In total:** **^\[A-Z]+\s\[A-Z]-|^**

7\) We must find a regex that would preferably cover all remaining annotations: PC, PE, GM3, Hex2Cer. The trick here is **\S** (which **means any non-whitespace character**). We add plus to have **\S+ one or more non-whitespace characters from the beginning of the string.**&#x20;

**In total:  ^\[A-Z]+\s\[A-Z]-|^\S+**&#x20;

**Or in R: ^\[A-Z]+\\\s\[A-Z]-|^\\\S+**

<figure><img src="/files/EFPGfuqbBzs5HmIuAABh" alt=""><figcaption><p>Extracting lipid class names from shorthand notations using regular expressions - regex covering all lipid class names.</p></figcaption></figure>

Using str\_extract() from stringr (tidyverse collection), we can now extract our lipid class names from the shorthand notations:

```r
# Extracting lipid class names from the shorthand notations:
Welch.test$Lipid.class <- str_extract(pattern = "^[A-Z]+\\s[A-Z]-|^\\S+", Welch.test$Lipids)
```

In this line of code, we add a new column to the 'Welch.test' tibble containing lipid class names. The strings with shorthand notations are available in the column `Lipids`.&#x20;

We need to extract the total carbon number (CN) and double bond number (DB), too. In the shorthand notations, these values are separated by a colon, e.g., PC 36:2. We can use it to build regex to extract one or more digits (**\d+**) before and after colon **(:)**. It would be:

<figure><img src="/files/js8HbjMCRyDUonkuzJBw" alt=""><figcaption><p>Extracting total carbon number (CN) and total double bond number (DB) from shorthand notations using regular expressions.</p></figcaption></figure>

The R function to extract CN and DB separated by colon is:

```r
# Extract CN and DB in the column named "composition":
Welch.test$composition <- str_extract(pattern = '\\d+[:]\\d+', Welch.test$Lipids)
```

Now, using **cSplit()** functions from the **splitstackshape package,** we can split the `composition` column into two columns, indicating the colon as a separator. We will name them CN and DB, respectively. Take a look at the code block below:

```r
# Splitting the `composition` column into `CN` and `DB` columns:
# Installing and calling library:
install.packages("splitstackshape")
library(splitstackshape)

# Reading the function documentation:
?cSplit()

# Splitting `composition` column:
Welch.test <- cSplit(Welch.test, splitCols = "composition", sep = ":")

# Changing names of the new columns to `CN` and `DB`:
names(Welch.test)[15] <- "CN"
names(Welch.test)[16] <- "DB"
```

Our 'Welch.test' tibble is now ready to create the first acyl-chain plot. A quick glimpse:

<figure><img src="/files/akCJ4c3T2ZzMCXE452Ny" alt=""><figcaption><p>Glimpse at final 'Welch.test' tibble.</p></figcaption></figure>

**Note:** Make sure that `CN` and `DB` columns are **\<int>!**&#x20;

**If you find regular expressions too complicated at this step - use spreadsheet software to add the missing columns. Export from R a .csv file with results of hypothesis testing and fold change computations. Open it in spreadsheet software and add columns that you need to create acyl-chain plots, i.e., lipid class, CN, and DB. Read the new data in R and continue to obtain the plot.**

**Time for the plot code.** In fact, it is a simple geom\_point()-based ggplot2 chart. The **-log10(p-values)** are mapped **to size** and **log2(FC)** to the **fill color**, respectively. The x-y axes have a continuous scaling covering the range of double bonds from 0 to 20 (y-axis) and the total number of carbons from 0 to 100 (x-axis). A specific scaling of point areas is applied to avoid overlapping points, and scale\_fill\_gradientn() is used to fill the points with selected colors stored in the vector 'colors'. To create a separate panel for every lipid class, facet\_wrap() across `Lipid.class` is used with free scales. Take a look at the code:

```r
# Acyl-chain plot code.
# Step - 1: Create a vector with fill colors:
colors <- c("#002060", "#0d78ca", "#00e8f0", "white", "#FF4D4D", "red", "#600000")

# Step - 2: Create plot:
Welch.test %>% 
  ggplot() + 
  geom_point(aes(CN,DB), color = 'black', size = 1, alpha = 0.4) + # Black points indicating measured, but not stat. significant lipids.
  geom_point(aes(CN,DB, size = -log10(p.adj), fill = log.2.FC), shape = 21, color = 'black') + # Stat. significant lipids.
  theme_bw() +
  labs(title = "Acyl-chain plots",
       subtitle = "Alterations in serum lipidome in patients with PDAC",
       x='Number of carbon atoms in side chain',
       y='Saturation')+ 
  scale_y_continuous(breaks=seq(0,20,1)) + 
  scale_x_continuous(breaks=seq(0,100,2)) +
  scale_size_area(breaks = c(1.3,2,3,4,5,6,7,8,9),
                  max_size = 6, 
                  limits = c(-log10(0.05),-log10(min(Welch.test[,'p.adj'])))) +
  scale_fill_gradientn(limits = c(-1.5,1.5), colors = colors) +
  facet_wrap(~ Lipid.class, scales = "free")
```

The final plot:

<figure><img src="/files/OF5oozXUDpSx8JXPtSNX" alt=""><figcaption><p>Acyl-chain plots presenting alterations in serum lipidome in patients with PDAC compared with healthy controls. </p></figcaption></figure>

&#x20;We encourage you to customize the plot according to your preferences!&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/metabolites-and-lipids-univariate-statistics-in-r/graphical-representation-of-univariate-statistics/lipid-maps-and-acyl-chain-plots.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
