| dc.description.abstract |
This dissertation suggests an entirely based machine learning solution, which is expected to significantly enhance loan default assessmetnt. We conducted a close comparison analysis of various machine learning and deep learning framework with a sharp instrument of XGBoost to boost the entire forecasting framework. This will be enhancedt through finding a better pre-processing methods, like outlier management by winsorization and data normalization with resilience scaling. Multiple resampling methods have been hard investigated; we identified the hybrid SMOTE + ENN as the most effective option with respect to balancing unbalanced datasets, achieving an impressive 90.49 percent accuracy, 94.61 percent precision and 92.02 percent recall. We discovered 48 optimal predictors via Recursive Feature Elimination with Cross-Validation (RFECV), with interest rate, FICO score and loan term being the most significant ones. It will be foundeed upon our novel stacking ensemble model, that cleverly involves a multitude of base learners’ predictions. This ensemble showed outstanding performance of 93.69 percent accuracy rate, 95.59 percent preciseness rate, 95.55 percent recall rate, and 97.81 percent Area Under the Receiver Operating Characteristic Curve (AUC), which is significantly high compared to individual model rates. Moreover, SHapley Additive exPlanations (SHAPs) are very transparent, and their components lend to practical understanding of the factors behind default prediction. This powerful, understandable, and movable paradigm offers financial organizations with a powerful tool to manage danger, reduce casualties, and increase lending choices in diverse datasets. |
en_US |