dc.description.abstract |
Customer churn prediction is a critical aspect of the telecommunications industry, significantly impacting a company's profitability and customer retention strategies. This thesis aims to develop an effective predictive framework using various machine learning algorithms. The study utilizes the IBM Telco dataset, which contains a class imbalance of 26.54% churned customers and 73.46% non-churned customers, across 33 features. The research focuses on evaluating the performance of seven machine learning techniques: Gradient Boosting Classifier, K-Nearest Neighbors (KNN), Logistic Regression, Naive Bayes, Random Forest, Support Vector Machine (SVM), and XGBoost. Grid Search Cross Validation, with K-Fold sizes of 5 and 10, is employed to optimize hyperparameters and ensure robust model selection. Data preprocessing techniques such as handling class imbalance, feature selection, and engineering are applied to enhance model accuracy. Experimental results demonstrate that Random Forest and XGBoost achieved the highest accuracy of 84% with a 5-fold cross-validation, while Naive Bayes exhibited the lowest performance at 77%. The Random Forest algorithm also recorded the best accuracy of 86% with a 10-fold crossvalidation. The thesis addresses several challenges, including class imbalance, overfitting. This research contributes to the field by offering a comparative analysis of machine learning techniques and their effectiveness in predicting customer churn, ultimately aiding telecom companies in devising better customer retention strategies. This study's findings will inform industry practitioners and researchers about the most effective machine learning practices for churn prediction, emphasizing the critical role of accurate and reliable models in maintaining customer loyalty and reducing churn rates. |
en_US |