Description
Underage drinking is a serious public health problem and current studies try to identify underlying risk factors. Previous research has shown that combining data from multi-modal sources allows to construct highly accurate machine learning or statistical models to predict alcohol misuse of young adults. In particular, (functional) magnetic resonance imaging has proven to provide valuable predictors. We combine data from different sources about 86 adolescents and use various models such as Logistic Regression, Support Vector Classifiers (with linear, sigmoidal and radial kernels) and Random Forests to select and extract predictors that yield very well-performing models (around 90% AUROC). We focus on consolidating the large amount of data to a small number of predictors which are highly relevant to all models regardless of their specific statistical properties. The importance of predictors is assessed in two different ways including Permutation Importance. We find that functional information on the Basal Ganglia, the size of the third Ventricle (together with the Cerebrospinal Fluid) as well as the sizes of the Thalamus and the Bilateral Orbitofrontal Thickness are the most important predictors for all considered statistical models. The Inferior Frontal Gyrus seems to contain relevant information too, but we find that functional information on this part of the brain might lead to overfitting models. Our results make it possible to construct highly accurate machine learning and statistical models that depend only on a very small number of predictors. This yields more manageable models which are easier to interpret and ultimately allow to draw more solid conclusions on the underlying risk factors of adolescent alcohol misuse in future projects.