Data transformation and scaling using mutate()
A part of data transformation & normalization – using available R packages
The mutate() function offers high flexibility in modifying the content of tibbles. We will use it here for:
Log-transformation,
And square-root-transformation of the lipidomics data set.
Moreover, we will show you how to center and scale the data set in the next step. We will present the following scaling methods here:
Autoscaling (also known as Unit-Variance-Scaling or UV-Scaling),
Pareto scaling,
Rage scaling,
Vast scaling,
Level scaling.
We again strongly recommend reading the following manuscript by Robert A. van den Berg et al.:
The manuscript presents data centering, scaling, and transformation for metabolomics data, including theoretical aspects and consequences of these operations. Here, we will rely on this work while preparing the functions, which enable data transformation, centering, and scaling.
We need to call the tidyverse collection to use mutate() function and pipes:
Logarithmic transformation
Let's begin with a popular logarithmic transformation. Load again the complete data set into R as 'data', check if the created object is tibble, and adjust column types if necessary.
The log transformation can be performed in one of the two following ways:

Suppose one would like to use a different logarithm base for this transformation. It can be achieved through an easy modification of the code above:
And the output:

Square-root-transformation
A simple change in the code from above enables performing square-root transformation. For the square-root-transformation, we can apply sqrt() function, or create a function x^0.5:
All these lines lead to one output:

Mean-centering data in R
Centering is subtracting the column mean from every entry in this column. Centered columns have a mean equal to 0. It is worth knowing that data centering is hidden in almost every regular scaling method. Centering only can be easily performed via mutate_if() function:
In this way, we obtain the following output:

Now, we can test, if the centering of our data worked correctly. In the 'Missing values handling in R' chapter, we showed you the sapply() function which allowed for applying functions to every column of a tibble and returned a vector. We will now recalculate the mean of every column and round it to 10 decimal places using the following line of code:
And the output:

Additional remark: if we did not round the result, you would find out that the mean has a small value, which, in fact, is almost 0. That is because different programming languages have limited precision of calculations, and numerical errors are normal. This value is so small anyway that rounding to 10 or even 15 decimal places still results in 0.
Data scaling in R
We will again apply mutate_if() function for the data scaling. Additionally, we will define the scaling functions separately:
After executing these lines of code, the scaling functions will appear in the global environment under 'Functions'. We are ready to scale the data set. We will scale the previously log-transformed data:
The output:

The output:

The output:

The output:

The output:

Last updated