💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  1. Data transformation, scaling, and normalization in R
  2. Data transformation and scaling using different available R packages

Data Normalization – bestNormalize R package

A part of data transformation & normalization – using available R packages

PreviousData transformation and scaling using recipes R packageNextData Transformation and scaling in Python

Last updated 1 year ago

The most important part of the metabolic analysis is the normalization of the data - and the selection of the most suitable method for our type of data. The bestNormalize\R package () has been created to facilitate the selection of a correct normalization. The risk of detecting false positive results increases in the effect of a wrong normalization.

The bestNormalize package provides researchers with a variety of tools and algorithms to assist in the selection of the most suitable normalization method, customized for their specific type of measured data. The package is designed with the goal of offering a user-friendly interface, simplifying the often intricate task of normalization, and enhancing both the accessibility and efficiency of the process.

NOTE: Let's follow these principles - which also outline this GitBook:

1) first, install packages,

2) load the required libraries,

3) set the working directory (wd),

4) load the data into R,

5) and perform your analysis.

Firstly, install and load the package/library:

# Installation (once only)
install.packages('bestNormalize')

Secondly, set the working directory and load data (see chapter ‘Loading data into R’ ), and perform

analysis:

# Library:
library(bestNormalize)

# Set the working directory:
setwd("...")

# Load the data into R:
data <- read.csv(file.choose())

# Find the optimal transformation using repeated cross-validation for our input dataset:
bestNormalize(data$CE.16.1, allow_lambert_s = TRUE)

The parameter allow_lambert_s = TRUE in the bestNormalize function indicates whether Lambert's W transformation with type 's' should be considered as a potential normalization method during the estimation process. When set to TRUE, it allows the algorithm to evaluate Lambert's W transformation with type 's' as one of the candidate transformations to determine the best-fit normalization for the given data.

Keep in mind that the transformation is performed individually for each metabolite/lipid, and the input must be a vector rather than the entire matrix.

The output:

The output from bestNormalize(data$CE.16.1, allow_lambert_s = TRUE) provides information about the best normalization transformation chosen for the specified variable (data$CE.16.1). Here's a breakdown of the output:

1. Estimated Normality Statistics:

The output lists various transformation methods along with their estimated normality statistics (Pearson P / df). Lower values indicate a more normal distribution. Methods include arcsinh(x), Box-Cox, Center+scale, Double Reversed Log_b(x+a), Lambert's W (type s), Log_b(x+a), orderNorm (ORQ), sqrt(x + a), and Yeo-Johnson.

2. Estimation Method:

The estimation method used is mentioned: Out-of-sample via Cross-Validation (CV) with 10 folds and 5 repeats.

3. Best Chosen Transformation:

The final selection by bestNormalize is a Standardized arcsinh(x)

4. Transformation.

Relevant statistics before standardization are provided: mean = 6.980617 and standard deviation (sd) = 0.5271941.

This package helps confirm that we have selected a suitable method for normalizing our metabolomics/lipidomics data.

https://cran.r-project.org/web/packages/bestNormalize/vignettes/bestNormalize.html
Data transformation using bestNormalize().