Getting started with R
Installation of R, RStudio, Jupyter Notebook, RMarkdown, Google Colab, R libraries and packages
Last updated
Installation of R, RStudio, Jupyter Notebook, RMarkdown, Google Colab, R libraries and packages
Last updated
R is a programming language designed for statistical analysis and the creation of graphics. R is open-source, which allows for the generation of elegant, publication-ready charts with a significant number of details and annotations. R also offers a wide range of tools for statistics and data analysis, which are particularly useful in the OMICs data analysis. In general, R tools provide linear and non-linear modeling, hypothesis testing, time-series analysis, data visualization, classification and clustering techniques, and others. The advantage of R data analysis and visualization arises from high flexibility in modifying outputs through simple code. R was developed for statistical analysis, and over time, it has become very popular among biologists and, subsequently, bioinformaticians due to its simple coded syntax. For more information about R follow the link:
Download and install R in the first step. All necessary files can be found on The Comprehensive R Archive Network - CRAN:
Regarding Windows: Select 'Download R for Windows'. Next, select 'base' as you install R for the first time, and then 'Download R-4.3.1 for Windows'. Open the .exe file and follow the installation wizard's instructions. We advise against customized installations in your first steps with R. You will also find more information on CRAN.
Regarding macOS: Select 'Download R for macOS.' Then, select between Apple silicon (arm64 v) or Inter Mac (x86_64) processors, download the .pkg file, and follow the installation instructions.
In the next step download RStudio. You will find the latest version here:
Regarding Windows: download .exe file and follow the installation wizard instructions.
Regarding macOS: download a .dmg package file. Once downloaded, drag the RStudio to the Application folder.
After you open RStudio you will see this:
Welcome to RStudio! Now, let’s explore R scripts and R coding. Navigate to the 'New file' button and from the list select 'R script':
Here is what you should see next:
And the final interface:
Now you see four white areas, which you will use while working with R. Below you will find brief explanations of what window will be used for what purpose:
Let's try to perform the first task in R. First let's save the script so that it can be shared, if necessary. This one will be named 'My first script - October 2023'.
We will try to share as much information as possible which is necessary for follow-up OMICs analysis, but it could be necessary to reach out to other sources to broaden your knowledge of the basics of using R commands and functions.
Below please find the exemplary script we prepared together:
In the red frame is the name of your current script, the '*' symbol next to it indicates that changes were introduced without saving the script. Therefore, it would be good to save the current version so it is not lost.
Commenting in R: An '#' is placed before every comment. Comments facilitate the exchange of information, opinions, and explanations within scripts among authors and users. It's important to note that information from comments will not be stored in the global environment when the script is executed on the computer.
In lines 2 and 3 (blue frame), two vectors are produced. The first one contains (1, 2, 3) and it will be stored in the global environment as 'a', while the second one contains (4, 5, 6) and it will be stored in the global environment as 'b'. In line 6 (orange frame), a new vector is produced, a sum of vectors a and b, and it will be stored in the global environment as 'c'. To see what the vector c contains, 'c' is called in the console or run as a line of script. To execute one line of code, highlight it with your mouse and press the 'Run' button (violet frame). You can also highlight and run the entire script. To execute one line of code, instead of pressing the 'Run' button you can use Ctrl + Enter.
Congratulations! You have just performed a computation in R. In the next chapters of the Gitbook, we will present the code in the gray frames, like the one below:
Our Gitbook also includes all scripts, thus it is possible to download and open them in the RStudio. Here is our first script:
Jupyter Notebook is also possible to use for R programming or executing basic commands. We will cover basic installation as an example.
We already showed how to install Jupyter Notebook (please see the Getting started with Python section). To run R in the Jupyter Notebook, you will need to install IRkernel, which is the R kernel for the Jupyter Notebook. Kernels are processes in Jupyter Notebook that run code in the selected programming language and return output to a user. First, find the R console; for example in Windows: go to Start, type R-4.3.1, and select the R icon with the corresponding version of R - as of October 2023 R-4.3.1). You should see this:
First, install the library 'devtools'. Type the following line in the console (you can also copy it from here):
Select the appropriate CRAN mirror, which corresponds to a location close to you, e.g. for Belgium we go for Belgium (Brussels), and press the OK button. Wait until R finishes installing 'devtools'.
Finally, install kernel spec. This will allow Jupyter to see the R kernel. Type:
Now, open the command window. In Windows, go to Start, type cmd. In the command window, type:
This will redirect you to Jupyter Notebook (or you will see how to access the server after executing jupyter lab). In the Launcher tab, next to Python 3 (ipykernel) you will now have the possibility to access R:
Open the Notebook with R and type:
Highlight both lines and hit 'Run'. After the library is installed and called, type:
Highlight both lines and run them. If you will generate the following box plots...
... congratulations! You are using R via Jupyter Notebook!
In case of issues with the installation of Jupyter Notebook and/or R kernel - you may also consider trying via Anaconda. You will find more information here:
R Markdown is a possible alternative to using Jupyter Notebook that can produce high-quality and reproducible documents with embedded text, code chunks and code output. Once finished, R Markdown documents can easily be exported to PDF, HTML or even Word documents for inclusion as supporting information in research articles, or as fully polished reports. Furthermore, R Markdown documents also support the use of multiple languages including Python.
To install R Markdown, execute the code below in the RStudio console.
A new R Markdown document can then be opened similarly to a new R script in RStudio.
Opening a new R Markdown file will produce the following in the document window.
Once completed, R Markdown documents can then be "knitted" to HTML, PDF or word documents by selecting "knit" in the document toolbar. This document can then be opened for viewing.
Fore more information on getting started with R Markdown, refer to the following resource.
Google Colab, short for Colaboratory, is a cloud-based platform offered by Google designed to facilitate collaborative and interactive programming. It provides a Jupyter Notebook environment that allows users to write and run R code collaboratively and shareably.
Google Colab integrates seamlessly with Google Drive, enabling easy notebook storage and sharing. Another notable feature of Colab is its provision of free GPU access, which enhances the efficiency of executing machine learning tasks—a significant benefit in predictive modeling. This platform proves invaluable for researchers, students, and data scientists, aligning well with our Gitbook.
If you have Gmail - navigate to your Google Drive:
Click on “+ New” button:
In “More”, you will find “Google Colaboratory”:
If you can not find it, add it using “Connect more apps” –> “Install”. Next, close it and look at the bar again.
Click on Google Colaboratory, and you will see a new ‘.ipynb’ file where you can write your code:
Note!
By default, Colab will load a Python environment, to work with R in Colab the runtime type has to be changed in "Runtime -> Change Runtime Type" choose R instead of Python 3 and click "Save":
For the statistical analysis and visualization of your data, you will need to use functions that are contained in R packages (libraries). First, packages containing these functions must be downloaded, installed, and called in R. After that, you can use functions for producing graphs, computations, data manipulation, and other tasks. Detailed information about the installation of individual packages in R can usually be found in their vignettes (package documentation). We will present the installation of all packages used in this Gitbook at the beginning of every script. You will also find two examples in this subchapter.
Many R packages are available at The Comprehensive R Archive Network (CRAN). Generally, the installation of all packages from CRAN can be frequently done via typing a simple command 'install.packages("library_name") and then executing it:
You may need to select again the CRAN mirror for installation, or update dependencies. For the package from this example, the console after the installation shows the following information:
Alternatively, packages for the -omics data analysis are also available at the Bioconductor. Please check the following R package for preparing volcano plots as an example:
You can find the installation instructions directly under the short description of this package. In this case, you will need to execute the following three lines of code:
If a package does not work correctly or there are issues with updates, you can simply remove that package and install it again. The newest version will be downloaded and installed, which in most cases fixes the problem.
To remove a package, use: remove.packages("library_name"):
After a package is installed, to use functions from this package, you have to call it using the command library():
Developed by a collaborative community of researchers and developers, Bioconductor provides a comprehensive suite of R packages for tasks such as microarray analysis, RNA sequencing, pathway analysis, and more. This platform has become a cornerstone in the bioinformatics community, empowering scientists to efficiently analyze complex biological data and advance our understanding of genOMICs.
Bioconductor () is an open-source and open-development software project that provides tools and resources for the analysis and comprehension of high-throughput genomic data in the R programming language. Specifically tailored for bioinformatics and computational biology, Bioconductor offers a vast collection of packages that cover various aspects of genomics, including data preprocessing, statistical analysis, and visualization.