Files
Abstract
The aim of this research work is predicting undergraduate student dropout in a public post-secondary education institution in the Southeast United States. The main sources of data are college database storage and National Student Clearinghouse. Datasets DS-57, DS-11 and DS-101 are created from those sources. All datasets are trained using suitable classification machine learning models. Agile practices are followed to perform experiments. From the results, it is observed that important features predictive of dropouts are related to academic performance and financial aid. Models are evaluated on percent accuracy and F-measure. Random Forest performed with 0.86 F-measure and 87.04 percent classification accuracy. Further training with ensemble machine learning techniques improved F-measure to 0.903 and classification accuracy to 90.8 percent.