Basic plotting

Required packages

The required packages for this section are pandas, matplotlib and seaborn. These can be installed with the following command in the command window (Windows) / terminal (Mac).

pip install pandas matplotlib seaborn

Loading the data

The most straightforward way to load and manipulate tabular data in python is through the Pandas library. Pandas can load data from a range of different formats, the most commonly used are Excel or .csv files. In this tutorial we will use the lipidomics demo dataset (in excel format), which you can download with the link below.

348KB

Lipidomics_dataset.xlsx

Place the downloaded Lipidomics_dataset.xlsx file in the same folder as your JupyterLab script. Then run the following code in Jupyter:

import pandas as pd
df = pd.read_excel("Lipidomics_dataset.xlsx", decimal=",")
df.head()

The first line imports the pandas library and gives you access to this library through the alias "pd". On the second line we use the read_excel function of the pandas library, to which we pass between quotes the name of the file we want to load, or the full path to the file if it is not stored in the same folder as the Jupyter script. The loaded data is stored as a DataFrame in the df variable.

On the third line we call the head() function on the Dataframe object that holds our data, which will display the first 5 rows of the table. It should look like this:

The rows of the table correspond the samples, the columns to lipid species, except for the first two columns which contain the unique sample IDs and the sample labels (to which group samples belong). A handy improvement that allows us to easily access the data of individual samples by their unique ID, is to specify that the column "Sample Name" should be used as the index column:

df.set_index("Sample Name", inplace=True)

Now if we want to access the data or for example sample 1a2, this can be done with:

df.loc["1a2"]

The complete code for loading the data:

import pandas as pd

df = pd.read_excel("Lipidomics_dataset.xlsx")
df.set_index("Sample Name", inplace=True)

Barplots

The pandas package has a number of built-in functions for the plotting of basic graphs. Under the hood pandas is relying on the matplotlib package for plotting, and to have more control over how to plot the data, we will also work with matplotlib directly. Now that the data is loaded and ready, let's load matplotlib:

import matplotlib.pyplot as plt

Next, we define which lipid we want to plot, we group our DataFrame by the Label variable (which defines to which groups the samples belong) and we calculate the mean and standard deviation. If the standard error is desired instead, std() can be replaced with sem() instead.

lipid = "CE 16:1"
df_group = df.groupby(["Label"])
means = df_group.mean()[lipid]
errors = df_group.std()[lipid]

Next, we load the fig and ax objects from matplotlib, which allows us to customize the titles (among many other parameters such as colors, line thickness, etc..., for which we refer to the matplotlib documentation. Finally, we use the plot.bar() function of the mean DataFrame, and pass in as arguments the errors, the customized axis with its titles (ax), and we define the size of the error whiskers (capsize=4), rotation of the labels (rot=0) and finally we pass in a list of color names.

fig, ax = plt.subplots()
ax.set_ylabel("Concentration (nmol / mL)")
ax.set_title(lipid)
means.plot.bar(yerr=errors, ax=ax, capsize=4, rot=0, 
                color=["royalblue","crimson", "orange"]);
plt.show()

We should get something that looks like this:

The complete code to make a barplot from a loaded pandas DataFrame:

import matplotlib.pyplot as plt

lipid = "CE 16:1"
df_group = df.groupby(["Label"])
means = df_group.mean()[lipid]
errors = df_group.std()[lipid]

fig, ax = plt.subplots()
ax.set_ylabel("Concentration (nmol / mL)")
ax.set_title(lipid)
means.plot.bar(yerr=errors, ax=ax, capsize=4, rot=0);
plt.show()

Boxplots

For boxplots, the built-in plotting capabilities of the pandas package are quite powerful. We can simply call the plot.box() function on our DataFrame. In this function we pass a list of the lipid IDs that we want to have plotted, and the y-axis title:

df.plot.box(
    column=['CE 16:1','TG 50:2', 'SM 42:2;O2'], 
    ylabel="Concentration (nmol / mL)");

The results should look like this:

By passing "Label" to the "by" argument, we can plot the species separately for the different Labels of the samples:

plot = df.plot.box(
    column=['CE 16:1','TG 50:2'], 
    ylabel="Concentration (nmol / mL)", by="Label");

And to get more control over the appearance of the box plot, we can switch to seaborn and define our preferred colors:

import seaborn as sns
my_pal = {"N": "royalblue", "PAN": "orange", "T":"red"}
sns.boxplot(data=df, x="Label", y='CE 16:1', palette=my_pal);
plt.show()

Histograms

Creating a histogram with Pandas is straightforward as well, we'll just need to load the ax object from matplotlib again to customize the axis titles:

fig, ax = plt.subplots()
ax.set_xlabel("Concentration (nmol / mL)")
ax.set_title("CE 16:1")
df["CE 16:1"].plot.hist(ax=ax);
plt.show()

And to show multiple groups with custom colors we can use seaborn:

import seaborn as sns
pal = {"N": "royalblue", "PAN": "orange", "T":"red"}
sns.displot(df, x="CE 16:1", hue="Label", binwidth=50, palette=pal);
plt.show()

Density plots

Creating density plots is highly similar to creating histograms:

fig, ax = plt.subplots()
ax.set_xlabel("Concentration (nmol / mL)")
ax.set_title("CE 16:1")
df["CE 16:1"].plot.kde(ax=ax);
plt.show()

Or with seaborn:

import seaborn as sns
pal = {"N": "royalblue", "PAN": "orange", "T":"red"}
sns.displot(df, x="CE 16:1", hue="Label", palette=pal, kind="kde");
plt.show()

Saving plots to image file

Plots made with Pandas, Seaborn or Matplotlib can all be saved by running the following command after the creation of the plot:

plt.savefig('figure.png', dpi=300, bbox_inches='tight')

This requires that pyplot from matplotlib is imported:

import matplotlib.pyplot as plt

Instead of .png, vector formats such as .svg or .pdf can also be used, such that the figures can be edited in programs like Inkscape or Illustrator.

plt.savefig('figure.svg', dpi=300, bbox_inches='tight')
plt.savefig('figure.pdf', dpi=300, bbox_inches='tight')

PreviousGGally for quick overviews NextScatter plots and linear regression

Last updated 3 months ago