Preferred formats in metabolomics and lipidomics analysis
A part of preparing data for analysis and visualization in OMICs analysis
Last updated
A part of preparing data for analysis and visualization in OMICs analysis
Last updated
Two types of objects can be created in R while introducing lipidomics/metabolomics data: classic data frames and tibbles. The data frames were partially explained in the first subchapter. Here, you will find the rest of the basic information.
IMPORTANT: In tidy data frames, one column represents one variable (a feature, lipid concentration, metabolite concentration, gender, age, tumor grade, smoking status, etc.), and every row represents one observation (one patient for whom all variables are collected in columns). Values are stored in cells.
Tibbles are nothing else, but modern data frames in R. Tibbles retain the most useful features of data frames tested over time and drop the currently redundant and irritating features. Tibbles are handled in R via tidyverse collection - precisely via the tibble package. Both will be introduced here. All functionalities of tibbles are summarized on the CRAN repository and the tidyverse project website. You will also find a simple comparison of tibbles and data frames with examples in the links below:
As tibbles are more user-friendly than data frames and offer new important functionalities, we will use them whenever it is possible in this Gitbook. However, some plotting libraries still do not accept tibbles and require data frames.
To use tibbles, you will first need to install the tidyverse collection. Remember, the installation of all packages is performed only once. However, loading the library must be repeated every time after you start your RStudio. We will also install the tidymodels collection in the same line of code. Here are the commands we can use:
Now, we can test whether the object created by the read_xlsx() function while reading data into R is a classic data frame or tibble. The tibble library delivers function is_tibble(). If the function returns TRUE, the object is the tibble; otherwise, FALSE is returned for other objects and classic data frames.
As the function returns TRUE, the tibble was created by read_xlsx(). We can print it. Type:
This way, we obtain in the console the following output:
Now, let's change the 'data' object into the data frame and apply the is_tibble() function:
Now, the is_tibble() function produces FALSE as the 'data' is no longer tibble. We can print it using:
The classic data frame can be turned into tibble using the command as_tibble():
As you can see in the examples provided above, tibbles are well-arranged after printing, allowing for simpler data inspection compared to a classic data frame. Additionally, the information on the object type and its size is available, as well as the data type in each column (right below the column name). This is missing after printing data frames, and additional functions have to be used in this case, e.g., the str():
Alternatively, a glimpse() function from the package pillar and re-exported by dplyr can be used:
All functionalities of tibbles, summarized in their 'laziness' and 'certainty', make them a safer and more suitable solution for beginners than classic data frames used previously in R.
Now, the new information for you is probably the so-called 'data type stored in columns', which is suggested by our tibble in the form of <chr> or <dbl>. The next subchapters will shortly present the data types in R.
The script containing all commands from this subchapter can be found here:
In summary, the tibbles package provides a good overview of the data structure and a format easily processed by functions for hypothesis testing, visualization, and machine learning, so-called tibbles. These features make tibbles particularly useful for metabolomics and lipidomics analysis.