DSpace Repository

Thyroid Disease Detection Using Machine Learning

Show simple item record

dc.contributor.author Ahmed, Md Mostakim
dc.contributor.author Shathy, Shamira Shams
dc.date.accessioned 2026-04-12T09:33:11Z
dc.date.available 2026-04-12T09:33:11Z
dc.date.issued 2025-09-17
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16765
dc.description Project Report en_US
dc.description.abstract Thyroid disorders, such as hypothyroidism and hyperthyroidism, are challenging to diagnose due to overlapping symptoms like fatigue and weight changes, compounded by inconsistent medical data. This study leverages machine learning to enhance thyroid disease detection using two robust datasets: the Kaggle Thyroid Disease Dataset (9,172 records, 31 features) and the UCI Thyroid Disease Dataset (2,801 instances, 29 attributes). For the Kaggle dataset, a CatBoost classifier was developed after rigorous preprocessing, including data cleaning, zero imputation, one-hot encoding, and SMOTE with undersampling to address class imbalance. The optimized CatBoost model, incorporating L2 regularization and balanced class weights, achieved 98.70% accuracy, 98.79% precision (measuring correct positive predictions), and 97% Area Under the Precision-Recall Curve (AU-PRC) for hyperthyroidism, surpassing prior benchmarks by 2-3%. For the UCI dataset, Decision Tree and Random Forest classifiers were built following median/mode imputation, label encoding, feature scaling, and SMOTE. The Decision Tree excelled with 99.11% accuracy, 99.12% precision, 99.11% recall, 99.07% F1-score, and 98.53% (±0.36%) cross-validation accuracy, outperforming Random Forest (98.04% accuracy, 98.44% ±0.14% crossvalidation) and existing studies. Feature importance, elucidated by Shapley Additive Explanations (SHAP, a method for interpreting model predictions), identified T3, TT4, T4U, FTI, and TSH as critical predictors, offering transparent insights for clinicians. Despite strengths, limitations include potential dataset biases and the need for realworld validation. Excellent accuracy and interpretability are demonstrated by these tree-based models, which reduce the risk of misdiagnosis and pave the way for ethical deployment in healthcare. SHAP also ensures clear and trustworthy clinical decision support. en_US
dc.description.sponsorship Daffodil International University en_US
dc.language.iso en_US en_US
dc.publisher Daffodil International University en_US
dc.subject Thyroid Disease Detection en_US
dc.subject Hypothyroidism And Hyperthyroidism en_US
dc.subject Machine Learning in Healthcare en_US
dc.subject CatBoost Classifier en_US
dc.subject Thyroid Dataset en_US
dc.title Thyroid Disease Detection Using Machine Learning en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account