💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  • R
  • Installing R
  • Installing RStudio
  • Installing of Jupyter Notebook for R
  • Using R Markdown
  • Google Colab and R
  • Installing R packages
  • Bioconductor
  1. Introduction

Getting started with R

Installation of R, RStudio, Jupyter Notebook, RMarkdown, Google Colab, R libraries and packages

PreviousGetting started with PythonNextExample data sets

Last updated 3 months ago

R

R is a programming language designed for statistical analysis and the creation of graphics. R is open-source, which allows for the generation of elegant, publication-ready charts with a significant number of details and annotations. R also offers a wide range of tools for statistics and data analysis, which are particularly useful in the OMICs data analysis. In general, R tools provide linear and non-linear modeling, hypothesis testing, time-series analysis, data visualization, classification and clustering techniques, and others. The advantage of R data analysis and visualization arises from high flexibility in modifying outputs through simple code. R was developed for statistical analysis, and over time, it has become very popular among biologists and, subsequently, bioinformaticians due to its simple coded syntax. For more information about R follow the link:

Installing R

Download and install R in the first step. All necessary files can be found on The Comprehensive R Archive Network - CRAN:

Regarding Windows: Select 'Download R for Windows'. Next, select 'base' as you install R for the first time, and then 'Download R-4.3.1 for Windows'. Open the .exe file and follow the installation wizard's instructions. We advise against customized installations in your first steps with R. You will also find more information on CRAN.

Regarding macOS: Select 'Download R for macOS.' Then, select between Apple silicon (arm64 v) or Inter Mac (x86_64) processors, download the .pkg file, and follow the installation instructions.

Installing RStudio

In the next step download RStudio. You will find the latest version here:

Regarding Windows: download .exe file and follow the installation wizard instructions.

Regarding macOS: download a .dmg package file. Once downloaded, drag the RStudio to the Application folder.

After you open RStudio you will see this:

Welcome to RStudio! Now, let’s explore R scripts and R coding. Navigate to the 'New file' button and from the list select 'R script':

Here is what you should see next:

And the final interface:

Now you see four white areas, which you will use while working with R. Below you will find brief explanations of what window will be used for what purpose:

Let's try to perform the first task in R. First let's save the script so that it can be shared, if necessary. This one will be named 'My first script - October 2023'.

We will try to share as much information as possible which is necessary for follow-up OMICs analysis, but it could be necessary to reach out to other sources to broaden your knowledge of the basics of using R commands and functions.

Below please find the exemplary script we prepared together:

In the red frame is the name of your current script, the '*' symbol next to it indicates that changes were introduced without saving the script. Therefore, it would be good to save the current version so it is not lost.

Commenting in R: An '#' is placed before every comment. Comments facilitate the exchange of information, opinions, and explanations within scripts among authors and users. It's important to note that information from comments will not be stored in the global environment when the script is executed on the computer.

In lines 2 and 3 (blue frame), two vectors are produced. The first one contains (1, 2, 3) and it will be stored in the global environment as 'a', while the second one contains (4, 5, 6) and it will be stored in the global environment as 'b'. In line 6 (orange frame), a new vector is produced, a sum of vectors a and b, and it will be stored in the global environment as 'c'. To see what the vector c contains, 'c' is called in the console or run as a line of script. To execute one line of code, highlight it with your mouse and press the 'Run' button (violet frame). You can also highlight and run the entire script. To execute one line of code, instead of pressing the 'Run' button you can use Ctrl + Enter.

Congratulations! You have just performed a computation in R. In the next chapters of the Gitbook, we will present the code in the gray frames, like the one below:

# My first script October 2023

a <- c(1,2,3)
b <- c(4,5,6)

c <- a + b

c

Our Gitbook also includes all scripts, thus it is possible to download and open them in the RStudio. Here is our first script:

Installing of Jupyter Notebook for R

Jupyter Notebook is also possible to use for R programming or executing basic commands. We will cover basic installation as an example.

We already showed how to install Jupyter Notebook (please see the Getting started with Python section). To run R in the Jupyter Notebook, you will need to install IRkernel, which is the R kernel for the Jupyter Notebook. Kernels are processes in Jupyter Notebook that run code in the selected programming language and return output to a user. First, find the R console; for example in Windows: go to Start, type R-4.3.1, and select the R icon with the corresponding version of R - as of October 2023 R-4.3.1). You should see this:

First, install the library 'devtools'. Type the following line in the console (you can also copy it from here):

# First, install devtools. 
install.packages('devtools') 

Select the appropriate CRAN mirror, which corresponds to a location close to you, e.g. for Belgium we go for Belgium (Brussels), and press the OK button. Wait until R finishes installing 'devtools'.

# Next, install IRkernel. Type (or just copy from here):
devtools::install_github('IRkernel/IRkernel')

# Don't worry, we will explain the installation and application of packages below

Finally, install kernel spec. This will allow Jupyter to see the R kernel. Type:

# Use:
IRkernel::installspec()

# If you want to make it system-wide go for: 
IRkernel::installspec(user = FALSE)

Now, open the command window. In Windows, go to Start, type cmd. In the command window, type:

jupyter lab

This will redirect you to Jupyter Notebook (or you will see how to access the server after executing jupyter lab). In the Launcher tab, next to Python 3 (ipykernel) you will now have the possibility to access R:

Open the Notebook with R and type:

install.packages('ggpubr')
library(ggpubr)

Highlight both lines and hit 'Run'. After the library is installed and called, type:

data <- iris
ggboxplot(data, x = 'Species', y = 'Sepal.Length')

Highlight both lines and run them. If you will generate the following box plots...

... congratulations! You are using R via Jupyter Notebook!

In case of issues with the installation of Jupyter Notebook and/or R kernel - you may also consider trying via Anaconda. You will find more information here:

Using R Markdown

R Markdown is a possible alternative to using Jupyter Notebook that can produce high-quality and reproducible documents with embedded text, code chunks and code output. Once finished, R Markdown documents can easily be exported to PDF, HTML or even Word documents for inclusion as supporting information in research articles, or as fully polished reports. Furthermore, R Markdown documents also support the use of multiple languages including Python.

To install R Markdown, execute the code below in the RStudio console.

# Install RMarkdown
install.packages("rmarkdown")

A new R Markdown document can then be opened similarly to a new R script in RStudio.

Opening a new R Markdown file will produce the following in the document window.

Once completed, R Markdown documents can then be "knitted" to HTML, PDF or word documents by selecting "knit" in the document toolbar. This document can then be opened for viewing.

Fore more information on getting started with R Markdown, refer to the following resource.

Google Colab and R

Google Colab, short for Colaboratory, is a cloud-based platform offered by Google designed to facilitate collaborative and interactive programming. It provides a Jupyter Notebook environment that allows users to write and run R code collaboratively and shareably.

Google Colab integrates seamlessly with Google Drive, enabling easy notebook storage and sharing. Another notable feature of Colab is its provision of free GPU access, which enhances the efficiency of executing machine learning tasks—a significant benefit in predictive modeling. This platform proves invaluable for researchers, students, and data scientists, aligning well with our Gitbook.

If you have Gmail - navigate to your Google Drive:

Click on “+ New” button:

In “More”, you will find “Google Colaboratory”:

If you can not find it, add it using “Connect more apps” –> “Install”. Next, close it and look at the bar again.

Click on Google Colaboratory, and you will see a new ‘.ipynb’ file where you can write your code:

Note!

By default, Colab will load a Python environment, to work with R in Colab the runtime type has to be changed in "Runtime -> Change Runtime Type" choose R instead of Python 3 and click "Save":

Installing R packages

For the statistical analysis and visualization of your data, you will need to use functions that are contained in R packages (libraries). First, packages containing these functions must be downloaded, installed, and called in R. After that, you can use functions for producing graphs, computations, data manipulation, and other tasks. Detailed information about the installation of individual packages in R can usually be found in their vignettes (package documentation). We will present the installation of all packages used in this Gitbook at the beginning of every script. You will also find two examples in this subchapter.

Many R packages are available at The Comprehensive R Archive Network (CRAN). Generally, the installation of all packages from CRAN can be frequently done via typing a simple command 'install.packages("library_name") and then executing it:

# The installation of R packages via install.packages("...") or install.packages('...')
# Let's try installing one library from the CRAN repository:

# or

install.packages('rstatix')

You may need to select again the CRAN mirror for installation, or update dependencies. For the package from this example, the console after the installation shows the following information:

Alternatively, packages for the -omics data analysis are also available at the Bioconductor. Please check the following R package for preparing volcano plots as an example:

You can find the installation instructions directly under the short description of this package. In this case, you will need to execute the following three lines of code:

# Downloading and installing R packages from Bioconductor. Example: 'EnhancedVolcano'

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("EnhancedVolcano")

# To install another package from Bioconductor, change the package name in line no. 3

If a package does not work correctly or there are issues with updates, you can simply remove that package and install it again. The newest version will be downloaded and installed, which in most cases fixes the problem.

To remove a package, use: remove.packages("library_name"):

# Removing R packages

remove.packages("rstatix")

# or

remove.packages('rstatix') 

After a package is installed, to use functions from this package, you have to call it using the command library():

# Calling rstatix library in R (example)
library(rstatix)

# Alternatively, if you want only one function from a package, you can use:
rstatix::cor_plot()

# The library_name:: option allows you to select one function from a library

Bioconductor

Developed by a collaborative community of researchers and developers, Bioconductor provides a comprehensive suite of R packages for tasks such as microarray analysis, RNA sequencing, pathway analysis, and more. This platform has become a cornerstone in the bioinformatics community, empowering scientists to efficiently analyze complex biological data and advance our understanding of genOMICs.

Bioconductor () is an open-source and open-development software project that provides tools and resources for the analysis and comprehension of high-throughput genomic data in the R programming language. Specifically tailored for bioinformatics and computational biology, Bioconductor offers a vast collection of packages that cover various aspects of genomics, including data preprocessing, statistical analysis, and visualization.

https://bioconductor.org/
81B
My first script - October 2023.R
The first script created in R.
RStudio interface.
Creating a new script.
Creating a new script.
Final RStudio interface with a new script tab open.
RStudio interface - explanations.
Saving your first script.
Contents of 'My first script - October 2023'.
Executing your first script.
R console - for R-4.3.1
The selection of CRAN mirror.
Jupyter Notebook with R kernel available.
Box plots generated in Jupyter Notebook using R.
Creating an R Markdown file
R Markdown document in RStudio.
Knitted R Markdown document.
The download and installation of an exemplary package from the CRAN repository - the R console after executing 'install.packages()'.
R: What is R?
Basic information about R from CRAN.
The Comprehensive R Archive Network
R can be downloaded from CRAN
PositPosit
Link to download the latest version of RStudio
Anaconda | The World's Most Popular Data Science PlatformAnaconda
Anaconda - an alternative solution for R and Python for data science.
Introduction
Getting started with RMarkdown.
EnhancedVolcanoBioconductor
EnhancedVolcano package available at Bioconductor.
Logo
Logo
Logo
Logo
Logo
Logo