Loading data into Python

1. Introduction

Pandas is a powerful library for handling tabular data in Python. It provides simple methods for loading data from CSV and Excel files into DataFrames, allowing for efficient data manipulation and analysis. We'll be using this demo file:

Required packages

The required package for this section is pandas. This can be installed with the following command in the command window (Windows) / terminal (Mac).

pip install pandas

2. Loading Data from a CSV File

CSV (Comma-Separated Values) files store tabular data in plain text format, making them widely used for data exchange.

Using an Absolute Path

An absolute path specifies the full directory structure where the file is stored. This method ensures correct file access regardless of the working directory.

import pandas as pd

# Example: Using an absolute path (modify based on your system)
file_path = "/absolute/path/to/your/demo_data_IS.csv"
df = pd.read_csv(file_path)

# Display the first 5 rows
df.head()

Using a Relative Path

# Example: Using a relative path where the .csv file
# is in the same folder as the Python script
file_path = "demo_data_IS.csv"
df = pd.read_csv(file_path)

# Display the first 5 rows
df.head()

A relative path specifies the file location relative to the script's execution directory. This is useful for portability across different systems.

3. Loading Data from an Excel File

Excel files (.xlsx) are common in data analysis, and Pandas provides easy methods to read them.

Using an Absolute Path

# Example: Using an absolute path for an Excel file
file_path = "/absolute/path/to/your/Matrix_PDAC_16082023.xlsx"
df = pd.read_excel(file_path)  # Default is the first sheet

# Display the first 5 rows
df.head()

Using a Relative Path

# Example: Using a relative path where the .xlsx file
# is in the same folder as the Python script
file_path = "Matrix_PDAC_16082023.xlsx"
df = pd.read_excel(file_path)

# Display the first 5 rows
df.head()

4. Examining the loaded data

After loading the excel data using the code above, we observe the data table:

Note that the row indices are numbered starting from 0. If our sample names are unique we can also use the Sample names as row indices:

# Load the provided Excel file
file_path = "Matrix_PDAC_16082023.xlsx"
df = pd.read_excel(file_path, index_col=0)

# Display the first few rows
df.head()

Now we observe:

This allows us to acces for example the first row by it's Sample name:

df.loc["1a1"] # access first row by sample name
df.iloc[0] # access first row by numerical index

Last updated