💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  • Python
  • Installing Python
  • Installing Python packages with pip
  • Exploring Varied Programming Platforms with Python
  • PyCharm
  • JupyterLab
  • Google Colab
  • Spyder IDE
  1. Introduction

Getting started with Python

Installation of Python, Pycharm, Jupyter Notebook, Google Colab, and Python repositories and libraries

PreviousVirtual environments - let's beginNextGetting started with R

Last updated 8 months ago

Python

Python is an open-source and interpreted programming language. Python has become immensely popular due to its availability for data analysis. Its extensive ecosystem empowers data analysts to process, explore, and present data. Due to the open-source approach, Python has an active community that fosters continuous innovation, offering solutions and support for a wide range of data-related challenges.

Installing Python

The latest version of Python can be downloaded from the official Python website:

The process of installing Python is straightforward. You should download and run the installer from the official Python download page to get the latest version of Python for your system. You can refer to the Python documentation for more details on the installation process and getting started:

Installing Python packages with pip

A lot of functionality is already build into the default installation of Python, but one of the real strengths of Python lies in the huge range of 3rd party packages that bring new capabilities. For example, standard python does not have a user friendly way of dealing with tabular data, but this is gracefully handled by the Pandas package.

By default with the installation of Python also came pip, the most popular python package manager. If we want to install Pandas for example, we can simply type in the command window (on Windows) or in the terminal (On Mac) (and hit Enter on the keyboard after typing/copying this command):

pip install pandas

Note that if the above command does not work, on some versions of the operating systems you have to use pip3 instead of pip:

pip3 install pandas

Throughout this guide, we will assume the standard pip command works on your system, but can always swap pip for pip3 if it does not work.

Exploring Varied Programming Platforms with Python

In the follow-up section, selected available programming platforms are presented. Each data analyst usually codes in a chosen environment. Among solutions most often used for OMICs analysis are the notebook format, e.g., Jupyter notebook or its online version operated by Google - Google Colab. In advanced OMICs analysis, you may frequently encounter platforms like PyCharm or Spyder IDE.

PyCharm

PyCharm stands out as a robust integrated development environment (IDE) meticulously crafted for Python development. Created by JetBrains, PyCharm equips Python developers and data scientists with an extensive set of tools encompassing coding, debugging, testing, and project management. Features such as intelligent code completion, advanced code navigation, and a built-in visual debugger, significantly boost productivity. Furthermore, the IDE extends its support to control systems such as Git. With its user-friendly interface and a rich array of features, PyCharm is a good solution for both novices and seasoned Python developers, streamlining the development workflow and facilitating the creation of top-notch Python applications.

The main advantage of Pycharm IDE for OMICs scientists is that it is very similar to RStudio.

You can easily install PyCharm by following the instructions on their official website:

Upon launching PyCharm, we opt for the "New Project" option in the initial window to create a new project:

In the subsequent window, we will designate the project's location to the directory representing the working path/folder where we intend to work.

It is essential to incorporate the previously configured interpreter (virtual environment) into PyCharm.

To achieve this, we choose the "Add Interpreter" option → "Add Local Interpreter." In the left column, we opt for the "Virtual Environment" choice, and on the right side, we select the "Existing" option for the Environment and then add the Interpreter (you will find more information in Virtual Environment - let's begin).

If you don't see "Add Local Interpreter", you have installed a newer version of Python IDE – which means that you have to set the Interpreter type based on the Environment which you have chosen (more information in Virtual Environment - let's begin):

JupyterLab

JupyterLab is another web-based interactive computational environment for creating notebook documents. JupyterLab makes it easy to manage and run your Python code.

Installing JupyterLab is as simple as running the following command:

pip install jupyterlab

And to start JupyerLab:

jupyter lab

Once JupyterLab is running, you should see the following screen in your web browser:

In the menu on the left (red arrow 1 in the screenshot above) you can navigate to the folder where you want to create your project. Once navigated to the desired location, click on the Python 3 button in the Launcher window (red arrow 2). A new window "Untitled.ipynb" will open:

The file "Untitled.ipynb" is now created in the selected folder. By right-clicking on the file in the explorer on the left (red arrow 1 on the screenshot above) the file can be renamed. Python code can be typed into the field indicated by arrow 2. Extra fields can be created by clicking on the button shown by arrow 3.

To get started, let's try loading the pandas library - again. Type the following command into the code field:

import pandas as pd

Then, when the cursor is still active on the code field, hit shift+Enter on the keyboard to run the code. You should see a number one appear before the code field, which indicates the code ran successfully.

Google Colab

Google Colab, short for Colaboratory, stands out as a cloud-based platform offered by Google, designed to facilitate collaborative and interactive Python programming. It provides a Jupyter Notebook environment that allows users to write and run Python code in a collaborative and shareable manner.

Google Colab integrates seamlessly with Google Drive, enabling easy storage and sharing of notebooks. Another notable feature of Colab is its provision of free GPU access, enhancing the efficiency of executing machine learning tasks - a significant benefit in predictive modeling. This platform proves invaluable for researchers, students, and data scientists, aligning well with our Gitbook.

If you have Gmail - navigate to your Google Drive:

Click on “+ New” button:

In “More”, you will find “Google Colaboratory”:

If you can not find it, add it using “Connect more apps” –> “Install”. Next, close it and look at the bar, again.

Click on Google Colaboratory, and you will see a new ‘.ipynb’ file where you can write your code:

For example, loading the pandas library which we will use extensively in the following OMICs analysis:

Spyder IDE

The last Python coding environment mentioned here is the Spyder IDE. Spyder is an integrated development environment (IDE) designed for scientific computing with a focus on Python. Installing Spyder is a straightforward process. Users can typically obtain Spyder by installing the Anaconda environment distribution – see previous subchapter Virtual Environment - let's begin.

For the installation, follow the guidelines from the website below:

Every programmer or data analyst has own preference for a coding environment. Throughout this book, we will frequently use PyCharm, Google Colab, and Jupyter Notebook for Python. We have chosen these tools not only for their availability but also because of their intuitive interfaces, aligning with programming standards such as PEP-8 ().

https://peps.python.org/pep-0008/
Download PythonPython.org
4. Using Python on WindowsPython documentation
Logo
5. Using Python on a MacPython documentation
Logo
Install PyCharm | PyCharmPyCharm Help
Logo
Installation Guide — Spyder 5 documentation
Logo
Logo