> For the complete documentation index, see [llms.txt](https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/metabolites-and-lipids-descriptive-statistics-analysis-in-python/correlation-analysis.md).

# Correlation analysis

## Required packages

The required packages for this section are pandas and seaborn. These can be installed with the following command in the command window (Windows) / terminal (Mac).

```
pip install pandas seaborn
```

## Loading the data

Like in the other sections we will use the lipidomics demo dataset:

{% file src="/files/O8GcvqGYOzpXcfU7NayN" %}

```python
import pandas as pd
import numpy as np
df = pd.read_excel("Lipidomics_dataset.xlsx", decimal=",")
df.set_index("Sample Name", inplace=True)
```

## Correlation plots

Creating a basic correlation plot is again very simple with Pandas and Seaborn:

<pre class="language-python"><code class="lang-python">import seaborn as sns
import matplotlib.pyplot as plt
<strong>sns.heatmap(df.corr(numeric_only=True));
</strong>plt.show()
</code></pre>

We can just call the heatmap funtion from seaborn, and call the corr funtion on the DataFrame (with numeric\_only=True to ignore non numeric the Label column in our Dataframe) and pass the calculated correlation matrix as an argument to the heatmap function. The results will look something like this:

<figure><img src="/files/BFdfKAKuTrKPiNomVwT5" alt=""><figcaption></figcaption></figure>

There are several problems with this plot that we'll address one by one. For starters there are many more lipids in this plot than there is space on the axis for the axis labels. We can fix this by setting the canvas size and the axis labels size on the ax object that we import from matplotlib (you may need to further adjust the figsize and labelsize parameters):

```python
fig, ax = plt.subplots(figsize=(20,20))
ax.tick_params(axis='both', which='major', labelsize=6)
sns.heatmap(df.corr(numeric_only=True), ax=ax);
plt.show()
```

<figure><img src="/files/uWA6aF6Xncl7gSK9auxe" alt=""><figcaption></figcaption></figure>

Next, correlation values are located in the interval \[-1;1], so it would make more sense to have a color scalebar that is white at zero, diverges to a different color for positive and negative values, and is set to -1 for the minimum value and +1 for the maximum:

```python
sns.heatmap(df.corr(numeric_only=True), ax=ax, cmap='vlag',vmin=-1, vmax=1);
fig, ax = plt.subplots(figsize=(20,20))
ax.tick_params(axis='both', which='major', labelsize=6)
plt.show()
```

<figure><img src="/files/T6nUdBj3YrEo8sUOU5uL" alt=""><figcaption></figcaption></figure>

Finally, we'll remove the redundant symmetry. We'll create a mask by first creating a matrix of ones with the same dimensions as the lipid DataFrame (using the numpy ones\_like funtion, and we'll set the data type to bool) and then we'll set the values above the diagonal to zero with the numpy triu function:

```python
mask = np.triu(np.ones_like(df.corr(numeric_only=True), dtype=bool))
```

we'll then pass in this mask to the mask parameter of the heatmap funtion:

```python
sns.heatmap(df.corr(numeric_only=True), ax=ax, cmap='vlag',vmin=-1, vmax=1);
fig, ax = plt.subplots(figsize=(20,20))
ax.tick_params(axis='both', which='major', labelsize=6)
plt.show()
```

<figure><img src="/files/UFUEsUFvjaW4Hnwod8Bb" alt=""><figcaption></figcaption></figure>

Finally, in correlation maps with fewer variables, it may be interesting to set add "annot=True" to the heatmap parameters, as this will show the actual correlation values on the map.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/metabolites-and-lipids-descriptive-statistics-analysis-in-python/correlation-analysis.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
