The required packages for this section are pandas, matplotlib and seaborn. These can be installed with the following command in the command window (Windows) / terminal (Mac).
pip install pandas matplotlib seaborn
Loading the data
Place the downloaded Lipidomics_missing_values_EXAMPLE.xlsx file in the same folder as your JupyterLab script. Then run the following code in Jupyter:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_excel("Matrix_missing_values_EXAMPLE.xlsx", decimal=",")
df.set_index("Sample Name", inplace=True)
We can generate a heatmap visualisation of the missing values across the table (white values indicate a missing value):
plt.figure(figsize=(24, 30)) # Modify the width and height as needed
sns.heatmap(df.isnull(), cbar=False)
plt.savefig("missing_values_heatmap.png", dpi=200, bbox_inches='tight')
plt.show()
We can visualize the % missing values in the samples:
# Calculate percentage of missing values per row (sample)
missing_percentage_per_sample = df.isnull().mean(axis=1) * 100
# Bar plot for missing values per sample
plt.figure(figsize=(30, 6))
missing_percentage_per_sample.sort_values(ascending=False).plot(kind='bar')
plt.title("Percentage of Missing Values per sample")
plt.xlabel("Samples")
plt.ylabel("Percentage Missing")
plt.tight_layout()
plt.show()
And for the species:
# Calculate percentage of missing values per row (sample)
missing_percentage_per_sample = df.isnull().mean(axis=1) * 100
# Bar plot for missing values per sample
plt.figure(figsize=(30, 6))
missing_percentage_per_sample.sort_values(ascending=False).plot(kind='bar')
plt.title("Percentage of Missing Values per sample")
plt.xlabel("Samples")
plt.ylabel("Percentage Missing")
plt.tight_layout()
plt.show()