💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  • Lists in Python
  • Tuples in Python
  • Sets in Python
  • Dictionaries in Python
  • Arrays in Python
  • Other Data Types Used in Python for -Omics Analysis
  • Summary
  1. PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON

Fundamental data structures

Python has several built-in primitive data types, the most important for data analysis being:

  • Integer (int): Whole numbers, e.g., x = 10

  • Floating Point (float): Decimal numbers, e.g., y = 3.14

  • String (str): Sequence of characters, e.g., text = "Metabolites ID"

  • Boolean (bool): Logical values, True or False (This can be useful to indicate if a sample or metabolite should be selected from a data table)

  • None (NoneType): None is used to define a null value, or no value at all.

Lists in Python

A list is the most basic data structure in Python used to store data. Unlike R, Python follows zero-based indexing, meaning the first element in a list is accessed with index 0. Lists in Python can hold heterogeneous data types (mixed types of data in a single list).

Creating a List

Lists are defined using square brackets [] and elements are separated by commas:

# List containing strings
lipids_list = ["Cholesterol", "PC 34:1", "PC 34:2", "TG 54:2"]

# List containing integers
int_list = [1, 100, 5, 4]

# List containing floating-point numbers (doubles)
float_list = [1.2, 3.5, 5.78, float('inf'), float('-inf'), float('nan')]

# List containing boolean values
bool_list = [True, False, True, True, False]

print(lipids_list)
print(int_list)
print(float_list)
print(bool_list)

Lists are highly flexible and allow modifications, including adding, removing, and modifying elements.

Accessing Elements in a List

Elements in a list can be accessed using indexing:

first_element = lipids_list[0]  # Accessing first element
last_element = lipids_list[-1]  # Accessing last element
subset = lipids_list[1:4]  # Slicing from index 1 to 3
print(first_element)
print(last_element)
print(subset)

and elements at a position can be overwritten by a new value:

lipids_list[0] = "Ergosterol" # changing the first element in lipids_list 
print(my_list)

Tuples in Python

A tuple is similar to a list, but it is immutable (cannot be changed after creation). Tuples are defined using parentheses ().

my_tuple = ("a", "b", "c")
num_tuple = (1, 2, 3, 4, 5)

Accessing Elements in a Tuple

Similar to lists, elements in a tuple can be accessed using indexing:

first_element = my_tuple[0]
last_element = my_tuple[-1]
print(first_element)
print(last_element)

Sets in Python

A set is an unordered collection of unique elements, defined using curly braces {}.

my_set = {"a", "b", "c", "a"}
print(my_set) #output: {'a', 'b', 'c'} (duplicates are removed)

Dictionaries in Python

A dictionary is a collection of key-value pairs, similar to named lists in R. It is defined using curly braces {} with keys and values separated by colons :.

mass_dict = {
    "PC 34:0": 761.5935,
    "PC 34:1": 759.5778,
    "PC 34:2": 757.5622,
    "PC 34:3": 755.5465
}

Accessing Elements in a Dictionary

Elements in a dictionary can be accessed using keys:

mz_34_0 = mass_dict["PC 34:0"]  # Accessing the value associated with key 'PC 34:0'
mz_34_1 = mass_dict.get("PC 34:1")  # Another way to access a value safely
print(mz_34_0)
print(mz_34_1)

Note that using square brackets for accessing values in a dictionary results in an error if the provided key isn't present in the dictionary. The .get() method however will not raise an error if the key isn't present but it will return the value None.

mz_34_0 = mass_dict["PC 36:6"]  #This will give a KeyError since "PC 36:6" is not present in the dictionary
mz_34_1 = mass_dict.get("PC 36:1")  # This will not give an error but will asign the value None to the variable
print(mz_34_1) # output: None

Arrays in Python

Python does not have built-in support for arrays like R does for vectors, but arrays can be created using the array module or NumPy.

import numpy as np
arr = np.array([1, 2, 3, 4, 5])

Arrays are useful for numerical operations and are more efficient than lists for large datasets. Similar to lists they can be indexed with square brackets to access or overwrite values at a specified position:

first_element = arr[0] #access the first element of the array
print(first_element)

arr[0] = 10 #overwrite the first element of the array
print(arr)

Other Data Types Used in Python for -Omics Analysis

  • NumPy Arrays: Numpy arrays can be multi-dimensional, for example 2D, to represent matrices.

  • Pandas DataFrames: Similar to R's data frames or tibbles, used for tabular data.

  • Pandas Series: Equivalent to a single column in a DataFrame, similar to a vector in R.

import pandas as pd

# Creating a DataFrame
data = {
    "Sample": ["S1", "S2", "S3"],
    "Lipid_Concentration": [12.5, 15.3, 18.1],
    "Group": ["Healthy", "Patient", "Healthy"]
}
df = pd.DataFrame(data)

Factors in Python (Categorical Data)

In Python, categorical data is handled using Pandas' Categorical type, similar to factors in R.

df["Group"] = df["Group"].astype("category")

Categorical variables are useful for grouping and statistical analysis.

Summary

In Python, lists, tuples, sets, and dictionaries are the core data structures. For handling large datasets, NumPy arrays and Pandas DataFrames are commonly used, especially in bioinformatics and -omics research. Categorical data can be represented using Pandas' Categorical type, aiding in statistical analysis and grouping.

Further reading on data structures in Python:

PreviousThe 'for' loop in R (advanced)NextLoading data into Python

Last updated 4 months ago

Python Lists and Tuples
NumPy for Scientific Computing
Pandas Documentation