Comparative Evaluation of Machine Learning Models for Telecom Customer Churn Prediction Using Advanced Feature Engineering and Enhanced Ensemble Learning

Jannat, Nure

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF SOFTWARE ENGINEERING
→
Thesis Report
→
View Item

dc.contributor.author	Jannat, Nure
dc.date.accessioned	2026-04-22T05:55:15Z
dc.date.available	2026-04-22T05:55:15Z
dc.date.issued	2025-11-30
dc.identifier.citation	SWT	en_US
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16971
dc.description	Thesis Report	en_US
dc.description.abstract	Customer churn prediction is very essential to telecom industry, because it is much less expensive to retain the ones they already have than to acquire new ones . This thesis presents an extensive comparative analysis of several different machine learning models to estimate telecom customer churn, using advanced feature engineering and enhanced ensemble learning techniques. The Dataset collect from Kaggle that publicly available named ‘Telco Customer Churn’ for this research. It has 7,043 customer records and 21 attributes related to demographics, service consumption, billing, and contractual information. There was a data preprocessing pipeline implemented that maintains a sequence structure. This includes advanced feature engineering like one-hot encoding for categorical variables and StandardScaler for feature normalization. The dataset is split into an 80:20 ratio for training and testing. A diverse set of machine learning models was evaluated, including Support Vector Machine (RBF kernel), Gradient Boosting, CatBoost, Random Forest, K-Nearest Neighbors, Extra Trees, LightGBM, XGBoost, and a Neural Network. Then I applied a stacked ensemble approach to improve prediction performance. The base learners of the stacking ensemble were Random Forest, CatBoost, XGBoost, and LightGBM, and the meta-learner was Logistic Regression (by default). For evaluate model performance, I used confusion matrices and standard classification metrics. The result from experiments it shows that the Support Vector Machine with RBF kernel achieved the highest predictive accuracy of 80.12%, outperforming both individual tree-based models and the stacking ensemble. These results suggest that kernel-based models can forecast churn better when they are employed with the correct feature engineering and normalization, even while ensemble learning methods work well. The results of this study provide substantial recommendations for telecom operators in selecting appropriate machine learning models to manage churn and improve customer retention methods.	en_US
dc.description.sponsorship	DIU	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Feature engineering	en_US
dc.subject	Customer churn prediction	en_US
dc.subject	Machine learning classification	en_US
dc.subject	Ensemble learning methods	en_US
dc.title	Comparative Evaluation of Machine Learning Models for Telecom Customer Churn Prediction Using Advanced Feature Engineering and Enhanced Ensemble Learning	en_US
dc.type	Thesis	en_US