| dc.description.abstract |
Predicting student academic outcomes has emerged as a critical feature of contemporary educational data mining to help institutions identify at-risk students, and increase student retention, and promote their academic performance. In this work, we introduce a new hybrid stacking ensemble model (OutcomeHyX) to predict three major student-resultant labels viz., Dropout, Enrolled and Graduate using a dataset that is rich in demographic, socioeconomic and academic performance features. The work also considers data quality through preprocessing, feature engineering and SMOTE-based balancing to enhance classification fairness of the proposed approach. Several baseline machine learning models such as KNN, SVM, Random Forest and XGBoost were compared to determine their performance baselines. Empirical study shows that the performance of traditional models is mediocre and there exist large differences on prediction accuracy across outcome classes, especially in minority Enrolled class. The proposed OutcomeHyX model, comprising Support Vector Machine and Random Forest as base learners, while using Logistic Regression as meta-learner outperforms other models with a Test accuracy of 87.46% along with remarkable class-wise F1-scores. With the ROC- AUC>0.95, its great discrimination capability is also substantiated. The results show that the hybrid stacking scheme dominates all standalone classifiers in exploiting non-linear and intricate structures in academic dataset. This study presents a reliable predictive model that universities can use as tool for early warning, progress monitoring and data-informed decision- making. The model also provides a scalable recipe for next-generation studies aimed at improving student success and retention. |
en_US |