Abstract:
Almost every financial institution, for instance, credit card companies and banks
heavily rely on credit risk grade systems to determine whether to issue a loan to the
probable debtor. They put the applicants into 8 categories like Superior, Good,
Acceptable, Marginal, Special Mention, Substandard, Doubtful, and Bad. They
generally depend on traditional judgmental techniques to approve the application which
takes a longer period of time. The process can be quickened by applying machine
learning algorithms where the models learn from data by analyzing the pattern and then
providing us with insight. Credit risk must be handled properly and it is very important
for banking institutions, as loss can appear when the debtor is unable to pay back the
owed money. In this study, the dataset will be analyzed where people are applying for
a loan will be my research subject. Various popular machine learning algorithms such
as Random Forest, Decision Tree, Naïve Bayes, KNN, Logistic Regression, and SVM
will be applied to train different models and try to predict the outcome of an application
being risky to grant a loan or not. The results like accuracy, precision, recall, and F1-
Score, the training, and the testing time of each model trained by the mentioned
machine learning algorithms will be compared. Finally, the result of each model will
be evaluated by applying K-Fold Cross-validation, confusion matrix, and AUC-ROC
Cure technique to find the best machine learning model among the mentioned models.
In this study, it has been observed that Random Forest is overall the best model with an
accuracy of 97.35%, precision of 99.84%, recall of 94.80%, F1-Score of 96.77%, AUC
Value of 96.8%, while logistic regression is the second-best algorithm to tackle this
problem with 96.59% of accuracy rate.