Replacing NAs via random forest (RF) model (randomForest library)
A part of missing value β data imputation section
# Installation of the randomForest library:
install.packages('randomForest')
# Activate the library
library(randomForest)
# Read about the function of interest rfImpute()
?rfImpute()
# We read the data into R and recheck (adjust) column types:
data.missing <- read_xlsx(file.choose())
# Print the data set in the console:
print(data.missing)
# Adjust the column `Label` to be a factor:
data.missing$Label <- as.factor(data.missing$Label)
# Since random processes are involved here, we need to set a seed for reproducibility:
set.seed(111)
# Imputation of missing values using random forest:
data.imputed.rf <- rfImpute(Label ~ ., data = data.missing[,-1], iter = 10)
# The first argument: Label ~ .
## We want to predict our Label based on all other columns (~ .).
# Second argument: data = data.missing[,-1]
## It's a data frame for imputation: all columns except one <chr> column: 'Sample Name'.
# Third argument: iter = 10
## Here, we specify how many random forests should be built to estimate NAs.
## Because of this, we needed to add the seed above.
# We can print the patched data frame in the console:
print(data.imputed.rf)
# And proceed to the next step.Last updated