Preferred formats in metabolomics and lipidomics analysis

A part of preparing data for analysis and visualization in OMICs analysis

Two types of objects can be created in R while introducing lipidomics/metabolomics data: classic data frames and tibbles. The data frames were partially explained in the first subchapter. Here, you will find the rest of the basic information.

IMPORTANT: In tidy data frames, one column represents one variable (a feature, lipid concentration, metabolite concentration, gender, age, tumor grade, smoking status, etc.), and every row represents one observation (one patient for whom all variables are collected in columns). Values are stored in cells.

Tibbles are nothing else, but modern data frames in R. Tibbles retain the most useful features of data frames tested over time and drop the currently redundant and irritating features. Tibbles are handled in R via tidyverse collection - precisely via the tibble package. Both will be introduced here. All functionalities of tibbles are summarized on the CRAN repository and the tidyverse project website. You will also find a simple comparison of tibbles and data frames with examples in the links below:

Information about tibbles - the CRAN repository.
Information about tibbles and comparison of tibbles and classic data frames - the tidyverse project website.

As tibbles are more user-friendly than data frames and offer new important functionalities, we will use them whenever it is possible in this Gitbook. However, some plotting libraries still do not accept tibbles and require data frames.

To use tibbles, you will first need to install the tidyverse collection. Remember, the installation of all packages is performed only once. However, loading the library must be repeated every time after you start your RStudio. We will also install the tidymodels collection in the same line of code. Here are the commands we can use:

# Install tidyverse and tidymodels packages at once
install.packages(c("tidyverse", "tidymodels"))

# or create an object containing the name of packages
packages <- c("tidyverse", "tidymodels")
install.packages(packages)

# Install tidyverse and tidymodels one by one if you don't feel confident enough
# First install tidyverse
install.packages("tidyverse")

# Next, install tidymodels
install.packages("tidymodels")

# Call tidyverse library
library(tidyverse)

Now, we can test whether the object created by the read_xlsx() function while reading data into R is a classic data frame or tibble. The tibble library delivers function is_tibble(). If the function returns TRUE, the object is the tibble; otherwise, FALSE is returned for other objects and classic data frames.

# Testing if an object is a classic data frame or tibble
is_tibble(data)

As the function returns TRUE, the tibble was created by read_xlsx(). We can print it. Type:

# Printing tibble (with default 10 rows)
print(data)

# or printing tibble with a selected number of rows
print(data, n = 20)

This way, we obtain in the console the following output:

Output of print() for a 'data' tibble.

Now, let's change the 'data' object into the data frame and apply the is_tibble() function:

# Changing object type from tibble into data frame
data <- as.data.frame(data)

# Checking if 'data' is still a tibble
is_tibble(data)

Now, the is_tibble() function produces FALSE as the 'data' is no longer tibble. We can print it using:

# Print data frame
print(data)
Output of print() for 'data' data frame

The classic data frame can be turned into tibble using the command as_tibble():

# Changing data frame into a tibble
data <- as_tibble(data)

# Testing if the 'data' object is the tibble again
is_tibble(data)

As you can see in the examples provided above, tibbles are well-arranged after printing, allowing for simpler data inspection compared to a classic data frame. Additionally, the information on the object type and its size is available, as well as the data type in each column (right below the column name). This is missing after printing data frames, and additional functions have to be used in this case, e.g., the str():

# Checking data type for all columns in a data frame
str(data, list.len = 129)

# The argument list.len can be used to indicate no of variables of interest
# As we had overall 129 columns (variables) - we want to check them all

Alternatively, a glimpse() function from the package pillar and re-exported by dplyr can be used:

# Checking data type for every column via glimpse() from pillar package
glimpse(data)

All functionalities of tibbles, summarized in their 'laziness' and 'certainty', make them a safer and more suitable solution for beginners than classic data frames used previously in R.

Now, the new information for you is probably the so-called 'data type stored in columns', which is suggested by our tibble in the form of <chr> or <dbl>. The next subchapters will shortly present the data types in R.

The script containing all commands from this subchapter can be found here:

Script containing all commands from this subchapter.

In summary, the tibbles package provides a good overview of the data structure and a format easily processed by functions for hypothesis testing, visualization, and machine learning, so-called tibbles. These features make tibbles particularly useful for metabolomics and lipidomics analysis.

Last updated