> For the complete documentation index, see [llms.txt](https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/metabolites-and-lipids-descriptive-statistics-analysis-in-python/basic-plotting.md).

# Basic plotting

## Required packages

The required packages for this section are pandas, matplotlib and seaborn. These can be installed with the following command in the command window (Windows) / terminal (Mac).

```
pip install pandas matplotlib seaborn
```

## Loading the data

The most straightforward way to load and manipulate tabular data in python is through the Pandas library. Pandas can load data from a range of different formats, the most commonly used are Excel or .csv files. In this tutorial we will use the lipidomics demo dataset (in excel format), which you can download with the link below.

{% file src="/files/O8GcvqGYOzpXcfU7NayN" %}

Place the downloaded Lipidomics\_dataset.xlsx file in the same folder as your JupyterLab script. Then run the following code in Jupyter:

```python
import pandas as pd
df = pd.read_excel("Lipidomics_dataset.xlsx", decimal=",")
df.head()
```

The first line imports the pandas library and gives you access to this library through the alias "pd". On the second line we use the read\_excel function of the pandas library, to which we pass between quotes the name of the file we want to load, or the full path to the file if it is not stored in the same folder as the Jupyter script. The loaded data is stored as a DataFrame in the df variable.

On the third line we call the head() function on the Dataframe object that holds our data, which will display the first 5 rows of the table. It should look like this:

<figure><img src="/files/SaSXd6vPtQWdXOnEjnRA" alt=""><figcaption></figcaption></figure>

The rows of the table correspond the samples, the columns to lipid species, except for the first two columns which contain the unique sample IDs and the sample labels (to which group samples belong). A handy improvement that allows us to easily access the data of individual samples by their unique ID, is to specify that the column "Sample Name" should be used as the index column:

```python
df.set_index("Sample Name", inplace=True)
```

Now if we want to access the data or for example sample 1a2, this can be done with:

```python
df.loc["1a2"]
```

The complete code for loading the data:

```python
import pandas as pd

df = pd.read_excel("Lipidomics_dataset.xlsx")
df.set_index("Sample Name", inplace=True)
```

## Barplots

The pandas package has a number of built-in functions for the plotting of basic graphs. Under the hood pandas is relying on the matplotlib package for plotting, and to have more control over how to plot the data, we will also work with matplotlib directly. Now that the data is loaded and ready, let's load matplotlib:

```python
import matplotlib.pyplot as plt
```

Next, we define which lipid we want to plot, we group our DataFrame by the Label variable (which defines to which groups the samples belong) and we calculate the mean and standard deviation. If the standard error is desired instead, std() can be replaced with sem() instead.

```python
lipid = "CE 16:1"
df_group = df.groupby(["Label"])
means = df_group.mean()[lipid]
errors = df_group.std()[lipid]
```

Next, we load the fig and ax objects from matplotlib, which allows us to customize the titles (among many other parameters such as colors, line thickness, etc..., for which we refer to the matplotlib documentation. Finally, we use the plot.bar() function of the mean DataFrame, and pass in as arguments the errors, the customized axis with its titles (ax), and we define the size of the error whiskers (capsize=4), rotation of the labels (rot=0) and finally we pass in a list of color names.

```python
fig, ax = plt.subplots()
ax.set_ylabel("Concentration (nmol / mL)")
ax.set_title(lipid)
means.plot.bar(yerr=errors, ax=ax, capsize=4, rot=0, 
                color=["royalblue","crimson", "orange"]);
plt.show()
```

We should get something that looks like this:

<figure><img src="/files/AIhFfzcVp8auuT8wdVhy" alt=""><figcaption></figcaption></figure>

The complete code to make a barplot from a loaded pandas DataFrame:

```python
import matplotlib.pyplot as plt

lipid = "CE 16:1"
df_group = df.groupby(["Label"])
means = df_group.mean()[lipid]
errors = df_group.std()[lipid]

fig, ax = plt.subplots()
ax.set_ylabel("Concentration (nmol / mL)")
ax.set_title(lipid)
means.plot.bar(yerr=errors, ax=ax, capsize=4, rot=0);
plt.show()
```

## Boxplots

For boxplots, the built-in plotting capabilities of the pandas package are quite powerful. We can simply call the plot.box() function on our DataFrame. In this function we pass a list of the lipid IDs that we want to have plotted, and the y-axis title:

```python
df.plot.box(
    column=['CE 16:1','TG 50:2', 'SM 42:2;O2'], 
    ylabel="Concentration (nmol / mL)");
```

The results should look like this:

<figure><img src="/files/bfxwy7NkIcNwEVsYIw61" alt=""><figcaption></figcaption></figure>

By passing "Label" to the "by" argument, we can plot the species separately for the different Labels of the samples:

```python
plot = df.plot.box(
    column=['CE 16:1','TG 50:2'], 
    ylabel="Concentration (nmol / mL)", by="Label");
```

<figure><img src="/files/Sw9sk256Ekq4JoyURhkl" alt=""><figcaption></figcaption></figure>

And to get more control over the appearance of the box plot, we can switch to seaborn and define our preferred colors:

```python
import seaborn as sns
my_pal = {"N": "royalblue", "PAN": "orange", "T":"red"}
sns.boxplot(data=df, x="Label", y='CE 16:1', palette=my_pal);
plt.show()
```

<figure><img src="/files/19GcdqHaImvABwPVuJ9m" alt=""><figcaption></figcaption></figure>

## Histograms

Creating a histogram with Pandas is straightforward as well, we'll just need to load the ax object from matplotlib again to customize the axis titles:

```python
fig, ax = plt.subplots()
ax.set_xlabel("Concentration (nmol / mL)")
ax.set_title("CE 16:1")
df["CE 16:1"].plot.hist(ax=ax);
plt.show()
```

<figure><img src="/files/0QQZtjUp2NE3CY4KZBL9" alt=""><figcaption></figcaption></figure>

And to show multiple groups with custom colors we can use seaborn:

<pre class="language-python"><code class="lang-python">import seaborn as sns
<strong>pal = {"N": "royalblue", "PAN": "orange", "T":"red"}
</strong>sns.displot(df, x="CE 16:1", hue="Label", binwidth=50, palette=pal);
plt.show()
</code></pre>

<figure><img src="/files/9w1MS7QaWoN1PDDeoKTc" alt=""><figcaption></figcaption></figure>

## Density plots

Creating density plots is highly similar to creating histograms:

```python
fig, ax = plt.subplots()
ax.set_xlabel("Concentration (nmol / mL)")
ax.set_title("CE 16:1")
df["CE 16:1"].plot.kde(ax=ax);
plt.show()
```

<figure><img src="/files/J2hKY5gQOemAVvqzJKjC" alt=""><figcaption></figcaption></figure>

Or with seaborn:

```python
import seaborn as sns
pal = {"N": "royalblue", "PAN": "orange", "T":"red"}
sns.displot(df, x="CE 16:1", hue="Label", palette=pal, kind="kde");
plt.show()
```

<figure><img src="/files/Q78oGc75afAqjDse61iG" alt=""><figcaption></figcaption></figure>

## Saving plots to image file

Plots made with Pandas, Seaborn or Matplotlib can all be saved by running the following command after the creation of the plot:

<pre class="language-python"><code class="lang-python"><strong>plt.savefig('figure.png', dpi=300, bbox_inches='tight')
</strong></code></pre>

This requires that pyplot from matplotlib is imported:

```python
import matplotlib.pyplot as plt
```

Instead of .png, vector formats such as .svg or .pdf can also be used, such that the figures can be edited in programs like Inkscape or Illustrator.

```python
plt.savefig('figure.svg', dpi=300, bbox_inches='tight')
plt.savefig('figure.pdf', dpi=300, bbox_inches='tight')
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/metabolites-and-lipids-descriptive-statistics-analysis-in-python/basic-plotting.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
