Abstract:
Nowadays, technology is advancing rapidly. With the advancement of technology, many
institutions are adapting their business with new technologies. Institutions have huge
amounts of data about the employers and clients. To handle huge amounts of data, many
institutions are applying many data mining techniques to maintain their institutions
properly and fast. In financial institutions like banks, to analyze and handle the data about
the customer is very necessary. To analyze the credit risk is a primary field in the banking
sectors and there are many techniques exist to predict whether a customer is credit worthy
or not and the possibility of loan default. In this research, we’ve used a dataset from a
Bangladeshi bank. The dataset is the credit defaulter dataset. We tried to predict the
delinquent customers who have the highest possibility of short term credit recovery. We
applied some machine and deep learning models to predict the credit recovery. The dataset
is imbalanced. First of all we balanced the dataset by using SMOTE technique and then we
performed feature scaling, feature selection process on the dataset. Finally, we applied
machine and deep learning models. Compared with all of the models, Random Forest (RF)
performed better than other models. We applied those models in both Train Test Split and
Stratified K-Fold CV methods. In the Train Test Split method, RF gives 93% accuracy and
in the Stratified K-Fold CV method, RF gives 94% accuracy. The result of the evaluation
and statistical metrics of this model are also good in both of these methods. In the case of
deep learning models, the best output comes from Artificial Neural Network (ANN) and
Multilayer Perceptron (MLP) with 90% accuracy. Overall RF performed better and can
better predict the credit recovery.