Description
This thesis consists of two parts. In the first part we illustrate variable selection using random forests in R with data from a study designed to identify which factors best predict students switching or persisting in a science, technology, engineering, or math (STEM) major. Throughout this section we discuss several important practical issues that can emerge when fitting random forests. In the second part of the thesis we use the variables selected from the first part to fit generalized linear mixed models. The results suggest that grades students receive in calculus I are highly influential in determining switching or persisting in a STEM major. Several other less pronounced effects emerged as well, including those related to SAT math and reading scores, gender, and student attitudes related to mathematics. Limitations and directions for future research are discussed.