Changing data frames format with pivot_longer()
Useful tricks and features in OMICs mining
Last updated
Useful tricks and features in OMICs mining
Last updated
In our article, we presented the difference between wide and long data frames. Classic wide data frames are mostly used in Microsoft Excel software where they are handy for basic statistical analysis and creating plots. Therefore, many users are used to this form, considering the common application of Microsoft Excel. However, changing wide data frames into long ones can simplify R code for computing descriptive statistics, hypothesis testing, and plotting. Take a look at the graphics below to recall the difference between wide and long data frames:
As you can see, after changing the wide data frame into the long format, all numeric values (concentrations of lipids or metabolites) are stored in one column. The number of columns was significantly reduced on account of the number of rows.
We will show you how to change the wide format into a long one. For this purpose, we will use the pivot_longer()
function from the tidyr package (). The tidyr is a part of the tidyverse collection. First, we call the tidyverse library:
The pivot_longer()
function will require additional arguments to be specified. Namely, which columns contain numeric data, and what should be the names of new columns in the long data frame, storing character and numeric data. If you are interested in what arguments the pivot_longer()
function contains, you can always use:
Alternatively, tidyverse website provides also examples and explanations of all details concerning each function and its arguments. Here is the information about the pivot_longer()
:
We will specify the following arguments in the function:
data - if you use pipe, data are supplied through the pipes,
cols - columns we would like to change into a long format,
names_to - the name of the new column storing all character variables,
values_to - the name of the new column storing all numeric variables.
And the final code:
Additionally, let's check whether the new long data frame is a classic R data frame or tibble:
The first function returns TRUE, meaning the new long data frame is tibble. Also, printing confirms that the object stored in the global environment is tibble, having the following dimensions: 28 829x4 (for our lipidomics data set), and four columns only:
Sample Name <chr>
,
Label <fct>
,
Lipids <chr>
,
Concentration <dbl>
or <num>
.
We can also take a glimpse at the new object we created:
...which provides the same information as printing the object data.long.no.1
.
The script containing all code blocks can be downloaded here:
Note!
Remember that you can always recreate a wide tibble by running this simple line of code:
Now that you know how pipe and pivot_longer()
functions work, let's begin using tidyverse functions to manipulate tibbles' content.