💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  • Selecting effect size for basic statistical tests
  • Computing effect size in R
  1. Metabolites and lipids univariate statistics in R

Effect size computation and interpretation

Metabolites and lipids univariate statistical analysis in R

Thus far we have been focused on determining the statistical significance, namely the p-value. We investigated if a difference exists between two or more biological groups. At this step, it is important to mention that the statistical significance is influenced by the collected sample sizes. Hence, if sample sizes are large even a small difference between two groups can be found statistically significant.

Now, we would like to show you an important measure that should be reported together with statistical significance - it is called effect size. The effect size is the magnitude of a difference between biological groups, showing if this effect is large enough to be meaningful, e.g., useful for further investigation. It is difficult to make such a judgment using p-values only, considering the influence of sample sizes on statistical significance. In the effect, as mentioned above, many journals expect to report effect sizes together with the p-values.

Here, you will find information about how effect size can be computed in R using rstatix or ggstatsplot libraries and what effect size can be used with statistical tests from the previous subchapters.

Selecting effect size for basic statistical tests

Below you will find exemplary effect sizes, which can be computed and reported together with the t-test, Mann-Whitney U test, ANOVA, and Kruskal-Wallis test:

Statistical test
Effect size
Example of effect size interpretation

t-test

1 - Cohen’s d,

2 - Hedges’ g

0.2 - small effect, 0.5 - medium effect, 0.8 and more - large effect

Mann-Whitney U test

1 - r value 2 - Rank biserial correlation

For r value: <0.3 small effect, 0.5 - moderate effect, 0.5 and more - large effect For rank biserial correlation: -1 - perfect negative relationship, 0 - no effect, 1 - perfect positive relationship

ANOVA

1 - η2 Eta Squared

0.01 - small effect size, 0.06 - medium effect size, 0.14 and more - large effect size

Kruskal-Wallis test

1 - ε2 Epsilon Squared 2 - η2 Eta Squared

0.01 - small effect size, 0.06 - medium effect size, 0.14 and more - large effect size

Computing effect size in R

The rstatix library contains dedicated functions for computing effect sizes. You will find examples in the code blocks below:

# Computing effect size in R with rstatix.
# Cohen's d effect size for t-test - via cohens_d().
# Documentation:
?cohens_d()

# Computing effect size:
Cohens.d <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 41:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  cohens_d(Concentrations ~ Label, 
           ref.group = "N")

print(Cohens.d)

We obtain:

# Computing effect size in R with rstatix.
# r value effect size for Mann-Whitney U test - via wilcox_effsize().
# Documentation:
?wilcox_effsize()

# Installing additional libraries needed:
install.packages("coin")

# Note - the function may need additional packages - here, coin packages was installed.

# Computing effect size:
r.value <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 41:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  wilcox_effsize(Concentrations ~ Label, 
           ref.group = "N")

print(r.value)

We obtain the following tibble:

NOTE: If you carefully check the tibble with ANOVA test results obtained from the anova_test() function, you will find the last column named 'ges', which stands for generalized eta squared. It is the effect size computed automatically:

# Computing effect size in R with rstatix.
# Eta squared effect size for ANOVA:
ANOVA <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 41:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  anova_test(Concentrations ~ Label)

print(ANOVA)

Here is the exemplary tibble:

The package also provides a function called eta_squared(). Using it, you can compute the effect size for the base ANOVA model built through the aov() function, for instance:

# Using the eta_squared() function from the rstatix library:
aov <- aov(`SM 41:1;O2` ~ Label, data)
eta_squared(aov)

The output in the R console:

> aov <- aov(`SM 41:1;O2` ~ Label, data)
> eta_squared(aov)
    Label 
0.3602876 

You will find exactly the same value in the 'ges' column of tibble with the ANOVA test results.

# Computing effect size in R with rstatix.
# Eta squared effect size for Kruskal-Wallis test - via kruskal_effsize().
# Documentation:
?kruskal_effsize()

# Computing effect size:
KW.effect.size <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 41:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  kruskal_effsize(Concentrations ~ Label)

We obtain:

Effect sizes were also automatically computed by the ggstatsplot library. Look at the examples below:

# Creating violin box plots with statistical annotations (ggstatsplot).

# Plot 1:
Welch <-
  data %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  filter(Label != "PAN") %>%
  ggbetweenstats(x = Label, y = "SM 41:1;O2", type = 'parametric') +
  scale_color_manual(values = c("royalblue", "red2"))
  
# Plot 2:
MW <-
  data %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  filter(Label != "PAN") %>%
  ggbetweenstats(x = Label, y = "SM 41:1;O2", type = 'nonparametric') +
  scale_color_manual(values = c("royalblue", "red2"))
  
# Plot 3:
ANOVA <-
  data %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  ggbetweenstats(x = Label, y = "SM 41:1;O2", type = 'parametric') +
  scale_color_manual(values = c("royalblue", "orange", "red2"))
  
# Plot 4:
KW <-
  data %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  ggbetweenstats(x = Label, y = "SM 41:1;O2", type = 'nonparametric') +
  scale_color_manual(values = c("royalblue", "orange", "red2"))
  
# Creating a list of plots:
list <- list(Welch, MW, ANOVA, KW)

# Combining all plots into one image:
combine_plots(list)

We obtain:

PreviousAdjustments of p-values for multiple comparisonsNextGraphical representation of univariate statistics

Last updated 3 months ago

Cohen's d effect size computed through cohens_d() function from the rstatix library.
The r-value effect size computed through wilcox_effsize() from the rstatix library.
The generalized eta squared effect size computed automatically by anova_test() from the rstatix package.
The eta squared based on the H-statistic - effect size for the Kruskal-Wallis test computed through kruskal_effsize() from the rstatix package.
The ggstatsplot box plots with detailed statistical annotations. In red frames - effect sizes computed automatically by the ggbetweenstats() function. Here, for ANOVA the omega squared effect size was proposed.