💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  1. PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
  2. Useful R tricks and features in OMICs mining

Changing data frames format with pivot_longer()

Useful tricks and features in OMICs mining

PreviousApplication of pipe (%>%) functionsNextData wrangling syntaxes useful in OMICs mining

Last updated 2 months ago

In our article, we presented the difference between wide and long data frames. Classic wide data frames are mostly used in Microsoft Excel software where they are handy for basic statistical analysis and creating plots. Therefore, many users are used to this form, considering the common application of Microsoft Excel. However, changing wide data frames into long ones can simplify R code for computing descriptive statistics, hypothesis testing, and plotting. Take a look at the graphics below to recall the difference between wide and long data frames:

As you can see, after changing the wide data frame into the long format, all numeric values (concentrations of lipids or metabolites) are stored in one column. The number of columns was significantly reduced on account of the number of rows.

We will show you how to change the wide format into a long one. For this purpose, we will use the pivot_longer() function from the tidyr package (). The tidyr is a part of the tidyverse collection. First, we call the tidyverse library:

# Call tidyverse collection
library(tidyverse)

# We will use pipes for preparing long data frame
# The new long data frame will be stored in 'data.long'
# The following pipeline will be used:
data.long <- 
  data %>%
  pivot_longer()

# 1. Take wide 'data' from the global environment
# 2. Push it through the pipe 
# 3. Change wide data into long using pivot_longer()

The pivot_longer() function will require additional arguments to be specified. Namely, which columns contain numeric data, and what should be the names of new columns in the long data frame, storing character and numeric data. If you are interested in what arguments the pivot_longer() function contains, you can always use:

# Opening help (vignette) regarding pivot_longer()
?pivot_longer()

Alternatively, tidyverse website provides also examples and explanations of all details concerning each function and its arguments. Here is the information about the pivot_longer():

We will specify the following arguments in the function:

  • data - if you use pipe, data are supplied through the pipes,

  • cols - columns we would like to change into a long format,

  • names_to - the name of the new column storing all character variables,

  • values_to - the name of the new column storing all numeric variables.

And the final code:

# Specifying arguments of pivot_longer() function:
# Option no. 1: indicate the range of columns by name (in our case - all lipids)
data.long.no.1 <- 
  data %>%
  pivot_longer(cols = `CE 16:1` : `SM 42:1;O2`,
              names_to = 'Lipids',
              values_to = 'Concentration')
              
# The new data frame is stored in the global environment as 'data.long.no.1'
 
# Option no. 2: indicate the range of columns by their number:
data.long.no.2 <- 
  data %>%
  pivot_longer(cols = 3:129,
               names_to = 'Lipids',
               values_to = 'Concentration')
               
# The new data frame is stored in the global environment as 'data.long.no.2'

# Option no. 3: indicate the range of columns by their type (take all numeric):

data.long.no.3 <- 
  data %>%
  pivot_longer(cols = where(is.numeric),
               names_to = 'Lipids',
               values_to = 'Concentration')
               
# The new data frame is stored in the global environment as 'data.long.no.3'

Additionally, let's check whether the new long data frame is a classic R data frame or tibble:

# Checking the type of the new object
is_tibble(data.long.no.1)

# or
print(data.long.no.1)

The first function returns TRUE, meaning the new long data frame is tibble. Also, printing confirms that the object stored in the global environment is tibble, having the following dimensions: 28 829x4 (for our lipidomics data set), and four columns only:

  • Sample Name <chr>,

  • Label <fct>,

  • Lipids <chr>,

  • Concentration <dbl> or <num>.

We can also take a glimpse at the new object we created:

# Glimpse at the new object
glimpse(data.long.no.1)

...which provides the same information as printing the object data.long.no.1.

The script containing all code blocks can be downloaded here:

Note!

Remember that you can always recreate a wide tibble by running this simple line of code:

# Recreating wide data tibble:
data.wide <- data.long.no.3 %>%
  pivot_wider(names_from = Lipids,
              values_from = Concentration)

Now that you know how pipe and pivot_longer() functions work, let's begin using tidyverse functions to manipulate tibbles' content.

https://tidyr.tidyverse.org/
Pivot data from wide to long — pivot_longer
Detailed description of pivot_longer() function arguments on the tidyverse collection website.
2KB
Using pivot_longer function in R (long data frames).R
Using pivot_longer() function for changing data frames into a long format.
Changing wide data frame commonly used in Excel into long data frame simplifying R code.
From R console: long tibble created from our 'data' lipidomics data set and stored as 'data.long.no.1'.
From R console: glimpse at the long tibble 'data.long.no.1'.
Logo