Loading data into R
A part of preparing data for analysis and visualization for OMICs analysis
Last updated
A part of preparing data for analysis and visualization for OMICs analysis
Last updated
In R, lipidomics and metabolomics data sets can be handled via data frames ("data.frame"). Data frames are composed of rows (horizontally aligned data) and columns (vertically aligned data). The data frames will be discussed in more detail in the next chapter. Here, we will show you how to read your lipidomics or metabolomics data set into R. We can upload data into R as an Excel table (.csv, .xlsx, etc.) and load the data using the data frame format. The example dataset is:
Setting up the working directory is an essential first step for several reasons. It ensures that all your scripts, data files, and output are stored in a structured and consistent location. When you read or save files (read.csv()
, write.csv()
, etc.), you don’t need to specify long file paths every time. Your scripts will work across different sessions and computers if they assume the correct working directory from the start.
First, we must verify that we are in the right working directory ("wd") and, if not, specify it. Take a look at the code block below:
Additionally, you can use RStudio's GUI to set the directory. A simple user-friendly way to find a path to your desired folder is to use the RStudio "Files" panel. First, (1) use the three dots or the "Go to directory" button to locate and open the appropriate folder. Once the desired folder is open, (2) click on the "More" button, and choose "Copy Folder Path to Clipboard". Now the absolute path is copied in your clipboard and you can easily paste it in the setwd()
function inside of the quotation marks.
Also, you can set working directory here by choosing "Set As Working Directory".
To read any kind of file into R, first you must specify the exact path to the file. There are two ways to do this:
Absolute Path specifies the full path to a file, starting from the root directory (or drive letter on Windows). It is an exact, fixed location of the file on your computer.
Relative Path is defined in relation to the current working directory (the folder where your R script is running or where RStudio is set to). It’s more flexible and portable, making it better for reproducibility.
For reading Excel files (.xlsx
and .xls
), we can use the function read_excel()
, that automatically detects file type. We can also specify the file type read_xlsx()
, or read_xls()
. We can load our data into data frame format, which is stored in the 'data' variable in the global environment:
And a glimpse at RStudio after executing these lines of code:
Now, let's take a look at the arguments of the read_xlsx() function. Type:
This will open for you the R documentation in the help tab about this function:
As you see, the read_xlsx() function contains multiple arguments, which can be very useful once you become more experienced in R. A detailed description of the input arguments is possible to find in the documents of the package:
The functions from the readxl package will need a specific path to your .xlsx file containing the data of interest, or - you can use the file.choose() option. If your .xlsx file contains more than one sheet, you can select a specific sheet that you want to load into R by defining its number or name. Furthermore, you can even introduce data from specific cells in the Excel file, giving a range. The argument col_names set by default to TRUE or T enables treating the first row of data as column names. The col_types argument enables setting a type of data stored in a column. By default, col_types is set to NULL, which means that the column type is guessed and may require adjustments in the next steps. By default, all blank cells will be interpreted as missing entries ('NA' values). Below is shown how to use some of these arguments:
To view your whole data frame in a separate tab next to your script, go to the global environment and click on the variable name you created - 'data'. You can also type in the console:
Your data set is now ready for the next steps. The complete script for introducing data into R is attached below: