💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  • 1. Introduction
  • 2. Loading Data from a CSV File
  • 3. Loading Data from an Excel File
  • 4. Examining the loaded data
  1. PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON

Loading data into Python

PreviousFundamental data structuresNextMissing values – Introduction

Last updated 4 months ago

1. Introduction

Pandas is a powerful library for handling tabular data in Python. It provides simple methods for loading data from CSV and Excel files into DataFrames, allowing for efficient data manipulation and analysis. We'll be using this demo file:

Required packages

The required package for this section is pandas. This can be installed with the following command in the command window (Windows) / terminal (Mac).

pip install pandas

2. Loading Data from a CSV File

CSV (Comma-Separated Values) files store tabular data in plain text format, making them widely used for data exchange.

Using an Absolute Path

An absolute path specifies the full directory structure where the file is stored. This method ensures correct file access regardless of the working directory.

import pandas as pd

# Example: Using an absolute path (modify based on your system)
file_path = "/absolute/path/to/your/demo_data_IS.csv"
df = pd.read_csv(file_path)

# Display the first 5 rows
df.head()

Using a Relative Path

# Example: Using a relative path where the .csv file
# is in the same folder as the Python script
file_path = "demo_data_IS.csv"
df = pd.read_csv(file_path)

# Display the first 5 rows
df.head()

A relative path specifies the file location relative to the script's execution directory. This is useful for portability across different systems.

3. Loading Data from an Excel File

Excel files (.xlsx) are common in data analysis, and Pandas provides easy methods to read them.

Using an Absolute Path

# Example: Using an absolute path for an Excel file
file_path = "/absolute/path/to/your/Matrix_PDAC_16082023.xlsx"
df = pd.read_excel(file_path)  # Default is the first sheet

# Display the first 5 rows
df.head()

Using a Relative Path

# Example: Using a relative path where the .xlsx file
# is in the same folder as the Python script
file_path = "Matrix_PDAC_16082023.xlsx"
df = pd.read_excel(file_path)

# Display the first 5 rows
df.head()

4. Examining the loaded data

After loading the excel data using the code above, we observe the data table:

Note that the row indices are numbered starting from 0. If our sample names are unique we can also use the Sample names as row indices:

# Load the provided Excel file
file_path = "Matrix_PDAC_16082023.xlsx"
df = pd.read_excel(file_path, index_col=0)

# Display the first few rows
df.head()

Now we observe:

This allows us to acces for example the first row by it's Sample name:

df.loc["1a1"] # access first row by sample name
df.iloc[0] # access first row by numerical index
348KB
Lipidomics_dataset.xlsx
1MB
demo_data_IS.csv