Loading data into R

A part of preparing data for analysis and visualization for OMICs analysis

In R, lipidomics and metabolomics data sets can be handled via data frames ("data.frame"). Data frames are composed of rows (horizontally aligned data) and columns (vertically aligned data). The data frames will be discussed in more detail in the next chapter. Here, we will show you how to read your lipidomics or metabolomics data set into R. We can upload data into R as an Excel table (.csv, .xlsx, etc.) and load the data using the data frame format. The example dataset is:

Setting up the working directory

Setting up the working directory is an essential first step for several reasons. It ensures that all your scripts, data files, and output are stored in a structured and consistent location. When you read or save files (read.csv(), write.csv(), etc.), you don’t need to specify long file paths every time. Your scripts will work across different sessions and computers if they assume the correct working directory from the start.

First, we must verify that we are in the right working directory ("wd") and, if not, specify it. Take a look at the code block below:

# Working directory (wd) is a default location on your computer used by R for reading files
# Frequently, wd is simply the 'Documents' folder. To check your wd, type:
getwd()

# To change wd to the path you would prefer more than 'Documents', use setwd(), e.g.:
setwd("D:/.../.../...")

# or
setwd('D:/.../.../...')  

# For example, if you want your data to be read from a folder 'Data analysis' (D: drive), type:
setwd('D:/Data analysis/')

# The strings with a working directory inside of setwd() can be:
# 1) in double quotation marks ("..."), 
# 2) or single quotation marks ('...').

Using RStudio's GUI to set the directory

Additionally, you can use RStudio's GUI to set the directory. A simple user-friendly way to find a path to your desired folder is to use the RStudio "Files" panel. First, (1) use the three dots or the "Go to directory" button to locate and open the appropriate folder. Once the desired folder is open, (2) click on the "More" button, and choose "Copy Folder Path to Clipboard". Now the absolute path is copied in your clipboard and you can easily paste it in the setwd() function inside of the quotation marks.

Also, you can set working directory here by choosing "Set As Working Directory".

Reading data into R

Using absolute and relative paths

To read any kind of file into R, first you must specify the exact path to the file. There are two ways to do this:

Absolute Path specifies the full path to a file, starting from the root directory (or drive letter on Windows). It is an exact, fixed location of the file on your computer.

# Reading data using absolute path
read.csv("D:/Data analysis/Lipidomics_data.xlsx")

Relative Path is defined in relation to the current working directory (the folder where your R script is running or where RStudio is set to). It’s more flexible and portable, making it better for reproducibility.

# Reading data using relative path
setwd("D:/Data analysis")
read.csv("Lipidomics_data.xlsx")

Reading Excel files using readxl package

For reading Excel files (.xlsx and .xls), we can use the function read_excel(), that automatically detects file type. We can also specify the file type read_xlsx(), or read_xls(). We can load our data into data frame format, which is stored in the 'data' variable in the global environment:

# Loading lipidomics data into R
# Step 1 - install the library. 
# See more info in chapter: Getting started with R.
# The package readxl can be downloaded from CRAN.
install.packages("readxl")

# Step 2 - call library / load library
library(readxl)

# Step 3 - set your wd:
setwd("D:/Data analysis/")

# Step 4 - load data into R, e.g.:
data <- read_xlsx(file.choose())
# In some cases, the pop-up window remains hidden behind the RStudio interface!

And a glimpse at RStudio after executing these lines of code:

Now, let's take a look at the arguments of the read_xlsx() function. Type:

# Check help for information about arguments of read_xlsx() function
?read_xlsx()

This will open for you the R documentation in the help tab about this function:

As you see, the read_xlsx() function contains multiple arguments, which can be very useful once you become more experienced in R. A detailed description of the input arguments is possible to find in the documents of the package:

The functions from the readxl package will need a specific path to your .xlsx file containing the data of interest, or - you can use the file.choose() option. If your .xlsx file contains more than one sheet, you can select a specific sheet that you want to load into R by defining its number or name. Furthermore, you can even introduce data from specific cells in the Excel file, giving a range. The argument col_names set by default to TRUE or T enables treating the first row of data as column names. The col_types argument enables setting a type of data stored in a column. By default, col_types is set to NULL, which means that the column type is guessed and may require adjustments in the next steps. By default, all blank cells will be interpreted as missing entries ('NA' values). Below is shown how to use some of these arguments:

# Loading data into R from a defined path (examples): 
data <- read_xlsx(path = "D:/Data analysis/Lipidomics_dataset.xlsx")

# or
data <- read_xlsx(path = 'D:/Data analysis/Lipidomics_dataset.xlsx')

# If you put your data in the working directory, you can import it using just a file name
data <- read_xlsx("Lipidomics_dataset.xlsx")

# or 
data <- read_xlsx('Lipidomics_dataset.xlsx')

# Recap from above:
# To change wd to the path you would prefer more than 'Documents', use setwd(), e.g.:
setwd("D:/.../.../...")

# or
setwd('D:/.../.../...')  

# For example, if you want your data to be read from a folder 'Data analysis' (D: drive), type:
setwd('D:/Data analysis/')

# Import data from a specific sheet of the Excel file (if your data are stored in wd):
data <- read_xlsx('Lipidomics_dataset.xlsx', sheet = 2) # using sheet number

# or 
data <- read_xlsx('Lipidomics_dataset.xlsx', sheet = 'sheet_name') # using sheet name

# Import data from a specific sheet of the Excel file (your data are not stored in wd):
# Using file.choose()
data <- read_xlsx(file.choose(), sheet = 2) # using sheet number

# or 
data <- read_xlsx(file.choose(), sheet = "sheet_name") # using sheet name

# or by providing a path: 
data <- read_xlsx(path = "D:/Data analysis/Lipidomics_dataset.xlsx", sheet = 2) # using sheet number

# or 
data <- read_xlsx(path = "D:/Data analysis/Lipidomics_dataset.xlsx", sheet = "sheet_name") # using sheet name
 #
# Reading in data from a selected range of cells if your data are in the wd:
data <- read_xlsx('Lipidomics_data.xlsx', range = "A1:I228")

# or - if we want to select data from a defined sheet & range of cells:
data <- read_xlsx('Lipidomics_data.xlsx', sheet = 2, range = 'A1:I220')

# or - if we want to select data from a sheet defined by name & range of cells:
data <- read_xlsx('Lipidomics_data.xlsx', sheet = 'sheet_name', range = 'A1:M50')

To view your whole data frame in a separate tab next to your script, go to the global environment and click on the variable name you created - 'data'. You can also type in the console:

View(data)

Your data set is now ready for the next steps. The complete script for introducing data into R is attached below:

PreviousFundamental data structures NextPreferred formats in metabolomics and lipidomics analysis

Last updated 4 months ago