Handling of missing data
Missing items in the data set was imputed using multiple imputation with random forest. Random forests is an ensemble learning method, primarily used for classification and regression, which operate by constructing a multitude of decision trees at training time and outputting the mode of the classes (classification) or mean prediction (regression) of the individual trees 10. When applied to data imputation, random forests leverage their inherent ability to handle non-linear relationships and interactions between variables to predict missing values with high accuracy 10,11. The imputation process was implemented using miceRanger package in R12. Details regarding the imputation process can be found in the supplementary material .