Early Lung Cancer Risk Prediction using Ensemble Machine Learning Models with SHAP for Explainability

Mithila, Nosrat Jahan

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF SOFTWARE ENGINEERING
→
Thesis Report
→
View Item

Early Lung Cancer Risk Prediction using Ensemble Machine Learning Models with SHAP for Explainability

Mithila, Nosrat Jahan

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17138

Date: 2025-10-20

Abstract:

The world is still concerned about lung and pulmonary tissue cancer, which is one of the main reasons for cancer death. This is primarily because of late-stage detection and incorrect diagnosis. The time of diagnosis directly affects treatment success and survival rates. This study aims to ascertain whether it is practical to advance machine learning for early lung cancer risk prediction using K Nearest Neighbors (KNN), Decision Trees, Support Vector Machines, and Logistic Regression models. A stacking ensemble model has been developed for this purpose, which combines multiple forecaster models to increase accuracy. When the model was tested on two separate datasets, it achieved accuracy scores of 99.9% and 98% on Dataset-1 and Dataset-2, respectively, surpassing the other models decisively. Additionally, there was also great predictive success. Predictive performance was transformed by imputing the models with deep learning models that rely on SHAP (SHapley Additive Explanations) in order to improve model transparency and identify risk predictors. The models also proved that and reinforces the ensemble models capabilities on accuracy and also model transparency, serving as a supportive resource in the clinical and clinical settings in order to improve lung cancer actionable decision and diagnosis.