Writing functions in R

Useful tricks and features in OMICs mining

Functions belong to one of the most frequently used objects in R. Functions in R, as in many programming languages, offer various benefits such as modularity, readability, parameterization, etc.

Among the simplest functions offered by R, we can list, for instance, functions for computing basic summary statistics - mean(), min(), max(), median(), sd(), range(), or others, like print() - for displaying R objects in the console, str() - for analyzing the structure of R objects, nrow()/ncol() returning the number of rows/columns of a vector or matrix, and many others. All of these functions are so-called built-in R functions.

Except for base R functions, or functions you can access after calling specific packages, you can write functions in R.

In lipidomics and metabolomics data analysis, we will frequently write functions for data scaling, transformation, and normalization. You will find examples, e.g., in the chapter Data Transformation, Scaling, and Normalization.

Here, you will learn how to create a basic R function and store it in the global environment. Take a look at the figure below:

Writing functions in R.

We can store the function in the global environment if we load it. Always use concise, simple, unambiguous noun-based names for functions. Next, you need to define input arguments, so function parameters - here, a list is expected of comma-separated symbols or statements, e.g.: 'symbol = expression'. When a function is called, these parameters receive values (we can call them arguments from this moment on). The function body is the space where you can define the operations to be performed.

Let's try constructing a function that computes the mean from three repetitions of creatinine level measurements in blood:

# Write in your script.
# We want to store our function as 'compute.mean':
compute.mean <- ...

# Next, we write the formal arguments: in parentheses, we add three symbols: 
# creatinine1, creatinine2, creatinine3:
compute.mean <- function(creatinine1, creatinine2, creatinine3) {}

# Finally, we add the body function:
compute.mean <- function(creatinine1, creatinine2, creatinine3) {
(creatinine1+creatinine2+creatinine3)/3
}

# Now, highlight all lines with your function.
# Hit - run, 
# ...and check if the function appeared in the global environment (in 'Functions').

Using the function is presented below:

# Using our function. In the console, type:
compute.mean(1.23,1.15,1.41)

# We deliver three values, so we have three arguments.
# And run it.

We receive the outcome (console):

> compute.mean(1.23,1.15,1.41)
[1] 1.263333

What happens if we would deliver two arguments out of three? We can easily check it:

# Incorrect usage of the 'compute.mean' function:
compute.mean(1.23,1.15)

We obtain:

> compute.mean(1.23,1.15)
Error in compute.mean(1.23, 1.15) : 
  argument "creatinine3" is missing, with no default

As you see, the function expects three values for all three arguments. Hence, it returns an error if there are not enough (or too many) arguments:

> compute.mean(1.23,1.15,1.41,1.04)
Error in compute.mean(1.23, 1.15, 1.41, 1.04) : unused argument (1.04)

In the next subchapters of this book, we will use functions frequently while preparing our data for OMICs analysis.

If you want to learn more about writing functions in R, take a look at these articles, books, or lectures:

Dataquest blog note by Elena Kosourova.
R for Data Science by Hadley Wickham et al.

Updated version:

R for Data Science by Hadley Wickham et al.
Statistics at UC Berkeley - Introduction to R language.

Last updated