Replacing NAs via k-nearest neighbor (kNN) model (VIM library)

A part of missing value – data imputation section

Another method to replace missing entries, such as MCAR and MAR, is to estimate them using a model. Replacing missing observations in metabolomics and lipidomics is frequently performed, e.g., via the k-nearest neighbor (kNN) model. For instance, as an example, take a look at the following manuscript:

  • M. Kaleta et al. Patients with Neurodegenerative Proteinopathies Exhibit Altered Tryptophan Metabolism in the Serum and Cerebrospinal Fluid. ACS Chemical Neuroscience (2024). DOI: https://doi.org/10.1021/acschemneuro.3c00611

Some studies report kNN suitability for MNAR, too, e.g.:

N. Frölich et al. Imputation of missing values in lipidomic datasets. The authors report that kNN is suitable for imputing MNAR in shotgun lipidomics data.

The kNN model estimates missing values based on the similarity to neighboring samples (data points). The KNN imputation can be easily implemented using VIM package (Visualization and Imputation of Missing Values):

VIM package at CRAN.

First, we will install this library (this operation is performed once), then load it, and read the documentation regarding the function of interested, which in this case will be kNN():

# Installing VIM package:
install.packages('VIM')

# Calling VIM package:
library(VIM)

# Reading documentation about kNN() function:
?kNN()

The kNN() function application is quite straightforward and, thus, one of the most used methods in OMICs analysis. We will adjust the number of neighbors to 10, and switch the imp_var argument to FALSE as we do not need to know in what entries of our tibble lipid concentration were imputed:

# Imputing missing values through KNN model with VIM package:
data.imputed.knn <- as_tibble(kNN(data.missing, k = 10, imp_var = F))

# The kNN() returns the data frame. To change it into tibble apply as_tibble().
print(data.imputed.knn)

We obtain the following final output:

Replacing missing values in the 'data.missing' tibble via KNN model (VIM package).

Last updated