Replacing NAs via random forest (RF) model (randomForest library)
A part of missing value – data imputation section
Last updated
A part of missing value – data imputation section
Last updated
Aside from the k-nearest neighbor, the random forest method is yet another approach for imputing missing completely at random (MCAR) and missing at random (MAR) entries in lipidomics and metabolomics data frames. The good performance of RF has been demonstrated, among others, by Wei et al. in their highly cited manuscript, which you can read here:
Here, we will use the rfImpute()
function from the randomForest library to substitute MAR values in the data frame we created by removing random numeric entries from the lipidomics data (the data can be downloaded from the "Example data sets", Chapter: Introduction).