💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  1. PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
  2. Useful R tricks and features in OMICs mining

Writing functions in R

Useful tricks and features in OMICs mining

PreviousData wrangling syntaxes useful in OMICs miningNextThe 'for' loop in R (advanced)

Last updated 4 months ago

Functions belong to one of the most frequently used objects in R. Functions in R, as in many programming languages, offer various benefits such as modularity, readability, parameterization, etc.

Among the simplest functions offered by R, we can list, for instance, functions for computing basic summary statistics - mean(), min(), max(), median(), sd(), range(), or others, like print() - for displaying R objects in the console, str() - for analyzing the structure of R objects, nrow()/ncol() returning the number of rows/columns of a vector or matrix, and many others. All of these functions are so-called built-in R functions.

Except for base R functions, or functions you can access after calling specific packages, you can write functions in R.

In lipidomics and metabolomics data analysis, we will frequently write functions for data scaling, transformation, and normalization. You will find examples, e.g., in the chapter Data Transformation, Scaling, and Normalization.

Here, you will learn how to create a basic R function and store it in the global environment. Take a look at the figure below:

We can store the function in the global environment if we load it. Always use concise, simple, unambiguous noun-based names for functions. Next, you need to define input arguments, so function parameters - here, a list is expected of comma-separated symbols or statements, e.g.: 'symbol = expression'. When a function is called, these parameters receive values (we can call them arguments from this moment on). The function body is the space where you can define the operations to be performed.

Let's try constructing a function that computes the mean from three repetitions of creatinine level measurements in blood:

# Write in your script.
# We want to store our function as 'compute.mean':
compute.mean <- ...

# Next, we write the formal arguments: in parentheses, we add three symbols: 
# creatinine1, creatinine2, creatinine3:
compute.mean <- function(creatinine1, creatinine2, creatinine3) {}

# Finally, we add the body function:
compute.mean <- function(creatinine1, creatinine2, creatinine3) {
(creatinine1+creatinine2+creatinine3)/3
}

# Now, highlight all lines with your function.
# Hit - run, 
# ...and check if the function appeared in the global environment (in 'Functions').

Using the function is presented below:

# Using our function. In the console, type:
compute.mean(1.23,1.15,1.41)

# We deliver three values, so we have three arguments.
# And run it.

We receive the outcome (console):

> compute.mean(1.23,1.15,1.41)
[1] 1.263333

What happens if we would deliver two arguments out of three? We can easily check it:

# Incorrect usage of the 'compute.mean' function:
compute.mean(1.23,1.15)

We obtain:

> compute.mean(1.23,1.15)
Error in compute.mean(1.23, 1.15) : 
  argument "creatinine3" is missing, with no default

As you see, the function expects three values for all three arguments. Hence, it returns an error if there are not enough (or too many) arguments:

> compute.mean(1.23,1.15,1.41,1.04)
Error in compute.mean(1.23, 1.15, 1.41, 1.04) : unused argument (1.04)

In the next subchapters of this book, we will use functions frequently while preparing our data for OMICs analysis.

If you want to learn more about writing functions in R, take a look at these articles, books, or lectures:

Updated version:

How to Write Functions in R (with 18 Code Examples)Dataquest
Dataquest blog note by Elena Kosourova.
19 Functions | R for Data Sciencehadley
R for Data Science by Hadley Wickham et al.
R for Data Science (2e) - 25  Functions
R for Data Science by Hadley Wickham et al.
Statistics at UC Berkeley - Introduction to R language.
Writing functions in R.
Logo
Logo