Abstract:
Customer churn prediction is very essential to telecom industry, because it is much less expensive to retain the ones they already have than to acquire new ones . This thesis presents an extensive comparative analysis of several different machine learning models to estimate telecom customer churn, using advanced feature engineering and enhanced ensemble learning techniques. The Dataset collect from Kaggle that publicly available named ‘Telco Customer Churn’ for this research. It has 7,043 customer records and 21 attributes related to demographics, service consumption, billing, and contractual information. There was a data preprocessing pipeline implemented that maintains a sequence structure. This includes advanced feature engineering like one-hot encoding for categorical variables and StandardScaler for feature normalization. The dataset is split into an 80:20 ratio for training and testing. A diverse set of machine learning models was evaluated, including Support Vector Machine (RBF kernel), Gradient Boosting, CatBoost, Random Forest, K-Nearest Neighbors, Extra Trees, LightGBM, XGBoost, and a Neural Network. Then I applied a stacked ensemble approach to improve prediction performance. The base learners of the stacking ensemble were Random Forest, CatBoost, XGBoost, and LightGBM, and the meta-learner was Logistic Regression (by default). For evaluate model performance, I used confusion matrices and standard classification metrics. The result from experiments it shows that the Support Vector Machine with RBF kernel achieved the highest predictive accuracy of 80.12%, outperforming both individual tree-based models and the stacking ensemble. These results suggest that kernel-based models can forecast churn better when they are employed with the correct feature engineering and normalization, even while ensemble learning methods work well. The results of this study provide substantial recommendations for telecom operators in selecting appropriate machine learning models to manage churn and improve customer retention methods.