Description
Estimation of person-specific risk for adverse health events in medicine has been approached almost exclusively using parametric statistical methods. By contrast, random forest is a machine learning method based on tree ensembles that is completely nonparametric and is better suited for risk prediction. This study outlined a series of steps in constructing a random forest based predictor. The methodology is then described and illustrated among patients undergoing primary total knee replacement. Study data is from a large health-maintenance organization’s total joint replacement registry. The paper illustrates calculation of risk for two adverse events following total knee replacements, 90-day mortality (N = 74,665) and time to mechanical failure of the device (N = 110,796). For the binary outcome of 90-day mortality regression random forests are used and for mechanical failure competing risk survival random forests are applied (with mortality as the competing event). Study found for binary risk outcomes, regression random forest achieves better prediction accuracy than classification random forest. By adjusting number of terminal node size, the prediction accuracy can be improved significantly, especially when the outcome is rare event. Additionally, using permuted importance and minimal depth to build a reduced model can achieve equivalent and even more accurate prediction than the full model. Issues related to implementation are discussed.