Effect size computation and interpretation

Metabolites and lipids univariate statistical analysis in R

Thus far we have been focused on determining the statistical significance, namely the p-value. We investigated if a difference exists between two or more biological groups. At this step, it is important to mention that the statistical significance is influenced by the collected sample sizes. Hence, if sample sizes are large even a small difference between two groups can be found statistically significant.

Now, we would like to show you an important measure that should be reported together with statistical significance - it is called effect size. The effect size is the magnitude of a difference between biological groups, showing if this effect is large enough to be meaningful, e.g., useful for further investigation. It is difficult to make such a judgment using p-values only, considering the influence of sample sizes on statistical significance. In the effect, as mentioned above, many journals expect to report effect sizes together with the p-values.

Here, you will find information about how effect size can be computed in R using rstatix or ggstatsplot libraries and what effect size can be used with statistical tests from the previous subchapters.

Selecting effect size for basic statistical tests

Below you will find exemplary effect sizes, which can be computed and reported together with the t-test, Mann-Whitney U test, ANOVA, and Kruskal-Wallis test:

Statistical test
Effect size
Example of effect size interpretation

t-test

1 - Cohen’s d,

2 - Hedges’ g

0.2 - small effect, 0.5 - medium effect, 0.8 and more - large effect

Mann-Whitney U test

1 - r value 2 - Rank biserial correlation

For r value: <0.3 small effect, 0.5 - moderate effect, 0.5 and more - large effect For rank biserial correlation: -1 - perfect negative relationship, 0 - no effect, 1 - perfect positive relationship

ANOVA

1 - η2 Eta Squared

0.01 - small effect size, 0.06 - medium effect size, 0.14 and more - large effect size

Kruskal-Wallis test

1 - ε2 Epsilon Squared 2 - η2 Eta Squared

0.01 - small effect size, 0.06 - medium effect size, 0.14 and more - large effect size

Computing effect size in R

The rstatix library contains dedicated functions for computing effect sizes. You will find examples in the code blocks below:

# Computing effect size in R with rstatix.
# Cohen's d effect size for t-test - via cohens_d().
# Documentation:
?cohens_d()

# Computing effect size:
Cohens.d <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 41:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  cohens_d(Concentrations ~ Label, 
           ref.group = "N")

print(Cohens.d)

We obtain:

Cohen's d effect size computed through cohens_d() function from the rstatix library.
# Computing effect size in R with rstatix.
# r value effect size for Mann-Whitney U test - via wilcox_effsize().
# Documentation:
?wilcox_effsize()

# Installing additional libraries needed:
install.packages("coin")

# Note - the function may need additional packages - here, coin packages was installed.

# Computing effect size:
r.value <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 41:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  wilcox_effsize(Concentrations ~ Label, 
           ref.group = "N")

print(r.value)

We obtain the following tibble:

The r-value effect size computed through wilcox_effsize() from the rstatix library.

NOTE: If you carefully check the tibble with ANOVA test results obtained from the anova_test() function, you will find the last column named 'ges', which stands for generalized eta squared. It is the effect size computed automatically:

# Computing effect size in R with rstatix.
# Eta squared effect size for ANOVA:
ANOVA <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 41:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  anova_test(Concentrations ~ Label)

print(ANOVA)

Here is the exemplary tibble:

The generalized eta squared effect size computed automatically by anova_test() from the rstatix package.

The package also provides a function called eta_squared(). Using it, you can compute the effect size for the base ANOVA model built through the aov() function, for instance:

# Using the eta_squared() function from the rstatix library:
aov <- aov(`SM 41:1;O2` ~ Label, data)
eta_squared(aov)

The output in the R console:

> aov <- aov(`SM 41:1;O2` ~ Label, data)
> eta_squared(aov)
    Label 
0.3602876 

You will find exactly the same value in the 'ges' column of tibble with the ANOVA test results.

# Computing effect size in R with rstatix.
# Eta squared effect size for Kruskal-Wallis test - via kruskal_effsize().
# Documentation:
?kruskal_effsize()

# Computing effect size:
KW.effect.size <- 
  data %>%
  select(-`Sample Name`) %>%
  pivot_longer(cols = `CE 16:1`:`SM 41:1;O2`,
               names_to = "Lipids",
               values_to = "Concentrations") %>%
  group_by(Lipids) %>%
  kruskal_effsize(Concentrations ~ Label)

We obtain:

The eta squared based on the H-statistic - effect size for the Kruskal-Wallis test computed through kruskal_effsize() from the rstatix package.

Effect sizes were also automatically computed by the ggstatsplot library. Look at the examples below:

# Creating violin box plots with statistical annotations (ggstatsplot).

# Plot 1:
Welch <-
  data %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  filter(Label != "PAN") %>%
  ggbetweenstats(x = Label, y = "SM 41:1;O2", type = 'parametric') +
  scale_color_manual(values = c("royalblue", "red2"))
  
# Plot 2:
MW <-
  data %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  filter(Label != "PAN") %>%
  ggbetweenstats(x = Label, y = "SM 41:1;O2", type = 'nonparametric') +
  scale_color_manual(values = c("royalblue", "red2"))
  
# Plot 3:
ANOVA <-
  data %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  ggbetweenstats(x = Label, y = "SM 41:1;O2", type = 'parametric') +
  scale_color_manual(values = c("royalblue", "orange", "red2"))
  
# Plot 4:
KW <-
  data %>%
  select(`Label`,
         `SM 41:1;O2`) %>%
  ggbetweenstats(x = Label, y = "SM 41:1;O2", type = 'nonparametric') +
  scale_color_manual(values = c("royalblue", "orange", "red2"))
  
# Creating a list of plots:
list <- list(Welch, MW, ANOVA, KW)

# Combining all plots into one image:
combine_plots(list)

We obtain:

The ggstatsplot box plots with detailed statistical annotations. In red frames - effect sizes computed automatically by the ggbetweenstats() function. Here, for ANOVA the omega squared effect size was proposed.

Last updated