Omics data visualization in R and Python

Replacing NAs via k-nearest neighbor (kNN) model (VIM library)

A part of missing value – data imputation section

PreviousData imputation using recipes library (tidymodels)NextReplacing NAs via random forest (RF) model (randomForest library)

Last updated 2 months ago

Replacing NAs via k-nearest neighbor (kNN) model (VIM library)

A part of missing value – data imputation section

Another method to replace missing entries, such as MCAR and MAR, is to estimate them using a model. Replacing missing observations in metabolomics and lipidomics is frequently performed, e.g., via the k-nearest neighbor (kNN) model. For instance, as an example, take a look at the following manuscript:

M. Kaleta et al. Patients with Neurodegenerative Proteinopathies Exhibit Altered Tryptophan Metabolism in the Serum and Cerebrospinal Fluid. ACS Chemical Neuroscience (2024). DOI:

Some studies report kNN suitability for MNAR, too, e.g.:

The kNN model estimates missing values based on the similarity to neighboring samples (data points). The KNN imputation can be easily implemented using VIM package (Visualization and Imputation of Missing Values):

First, we will install this library (this operation is performed once), then load it, and read the documentation regarding the function of interested, which in this case will be kNN():

# Installing VIM package:
install.packages('VIM')

# Calling VIM package:
library(VIM)

# Reading documentation about kNN() function:
?kNN()

The kNN() function application is quite straightforward and, thus, one of the most used methods in OMICs analysis. We will adjust the number of neighbors to 10, and switch the imp_var argument to FALSE as we do not need to know in what entries of our tibble lipid concentration were imputed:

# Imputing missing values through KNN model with VIM package:
data.imputed.knn <- as_tibble(kNN(data.missing, k = 10, imp_var = F))

# The kNN() returns the data frame. To change it into tibble apply as_tibble().
print(data.imputed.knn)

We obtain the following final output:

PreviousData imputation using recipes library (tidymodels)NextReplacing NAs via random forest (RF) model (randomForest library)

Last updated 2 months ago