Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
A part of missing value – data imputation section
Last updated
A part of missing value – data imputation section
Last updated
Another method to replace missing entries, such as MCAR and MAR, is to estimate them using a model. Replacing missing observations in metabolomics and lipidomics is frequently performed, e.g., via the k-nearest neighbor (kNN) model. For instance, as an example, take a look at the following manuscript:
M. Kaleta et al. Patients with Neurodegenerative Proteinopathies Exhibit Altered Tryptophan Metabolism in the Serum and Cerebrospinal Fluid. ACS Chemical Neuroscience (2024). DOI:
Some studies report kNN suitability for MNAR, too, e.g.:
The kNN model estimates missing values based on the similarity to neighboring samples (data points). The KNN imputation can be easily implemented using VIM package (Visualization and Imputation of Missing Values):
First, we will install this library (this operation is performed once), then load it, and read the documentation regarding the function of interested, which in this case will be kNN()
:
The kNN()
function application is quite straightforward and, thus, one of the most used methods in OMICs analysis. We will adjust the number of neighbors to 10, and switch the imp_var
argument to FALSE as we do not need to know in what entries of our tibble lipid concentration were imputed:
We obtain the following final output: