Lung Cancer Prediction Using Machine Learning Techniques

Akash, M.K.

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

dc.contributor.author	Akash, M.K.
dc.date.accessioned	2025-09-29T06:08:06Z
dc.date.available	2025-09-29T06:08:06Z
dc.date.issued	2024-07-13
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/14758
dc.description	Project report	en_US
dc.description.abstract	In my thesis project, "Lung Cancer Prediction Using Machine Learning Techniques," I aimed to develop a reliable system for predicting lung cancer risk through the application of various machine learning algorithms. The dataset utilized was sourced from Kaggle, originating from an online lung cancer prediction system. It comprised multiple attributes related to individuals' demographics, lifestyle choices, and health symptoms, with a binary target variable indicating the presence or absence of lung cancer. Initially, I preprocessed the dataset, converting certain column values to binary (0 and 1) and addressing missing values. During exploratory data analysis, I identified an imbalance in the target distribution and mitigated it using oversampling techniques. Additionally, I performed feature engineering by eliminating irrelevant features and creating new ones to enhance predictive capability. To reduce dimensionality, I employed Principal Component Analysis (PCA) before training several machine learning models including Logistic Regression, Decision Tree, K Nearest Neighbor, Multinomial Naive Bayes, Support Vector Classifier, and Multi- layer Perceptron classifier. Among these models, Logistic Regression emerged as the top performer, achieving an accuracy of 95%. Subsequently, I applied Grid Search on Logistic Regression to optimize hyperparameters, resulting in a slight accuracy improvement to 94.89%. Despite experimenting with ensemble techniques like Voting Classifier, Logistic Regression consistently outperformed other models. Finally, I conducted K-Fold cross- validation to validate model robustness, with Logistic Regression demonstrating the highest average accuracy compared to Decision Tree and Multi-layer Perceptron. In conclusion, my research highlights Logistic Regression as the most effective model for lung cancer risk prediction, emphasizing its accuracy and reliability based on the given dataset and features.	en_US
dc.description.sponsorship	DIU	en_US
dc.language.iso	en	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Lung cancer	en_US
dc.subject	Machine Learning	en_US
dc.subject	Computer-aided diagnosis (CAD)	en_US
dc.subject	Medical imaging	en_US
dc.title	Lung Cancer Prediction Using Machine Learning Techniques	en_US
dc.type	Other	en_US