Correlation analysis
Last updated
Last updated
The required packages for this section are pandas and seaborn. These can be installed with the following command in the command window (Windows) / terminal (Mac).
Like in the other sections we will use the lipidomics demo dataset:
Creating a basic correlation plot is again very simple with Pandas and Seaborn:
We can just call the heatmap funtion from seaborn, and call the corr funtion on the DataFrame (with numeric_only=True to ignore non numeric the Label column in our Dataframe) and pass the calculated correlation matrix as an argument to the heatmap function. The results will look something like this:
There are several problems with this plot that we'll address one by one. For starters there are many more lipids in this plot than there is space on the axis for the axis labels. We can fix this by setting the canvas size and the axis labels size on the ax object that we import from matplotlib (you may need to further adjust the figsize and labelsize parameters):
Next, correlation values are located in the interval [-1;1], so it would make more sense to have a color scalebar that is white at zero, diverges to a different color for positive and negative values, and is set to -1 for the minimum value and +1 for the maximum:
Finally, we'll remove the redundant symmetry. We'll create a mask by first creating a matrix of ones with the same dimensions as the lipid DataFrame (using the numpy ones_like funtion, and we'll set the data type to bool) and then we'll set the values above the diagonal to zero with the numpy triu function:
we'll then pass in this mask to the mask parameter of the heatmap funtion:
Finally, in correlation maps with fewer variables, it may be interesting to set add "annot=True" to the heatmap parameters, as this will show the actual correlation values on the map.