Abstract:
Thyroid disorders, such as hypothyroidism and hyperthyroidism, are challenging to
diagnose due to overlapping symptoms like fatigue and weight changes, compounded
by inconsistent medical data. This study leverages machine learning to enhance
thyroid disease detection using two robust datasets: the Kaggle Thyroid Disease
Dataset (9,172 records, 31 features) and the UCI Thyroid Disease Dataset (2,801
instances, 29 attributes). For the Kaggle dataset, a CatBoost classifier was developed
after rigorous preprocessing, including data cleaning, zero imputation, one-hot
encoding, and SMOTE with undersampling to address class imbalance. The optimized
CatBoost model, incorporating L2 regularization and balanced class weights, achieved
98.70% accuracy, 98.79% precision (measuring correct positive predictions), and 97%
Area Under the Precision-Recall Curve (AU-PRC) for hyperthyroidism, surpassing
prior benchmarks by 2-3%. For the UCI dataset, Decision Tree and Random Forest
classifiers were built following median/mode imputation, label encoding, feature
scaling, and SMOTE. The Decision Tree excelled with 99.11% accuracy, 99.12%
precision, 99.11% recall, 99.07% F1-score, and 98.53% (±0.36%) cross-validation
accuracy, outperforming Random Forest (98.04% accuracy, 98.44% ±0.14% crossvalidation) and existing studies. Feature importance, elucidated by Shapley Additive
Explanations (SHAP, a method for interpreting model predictions), identified T3, TT4,
T4U, FTI, and TSH as critical predictors, offering transparent insights for clinicians.
Despite strengths, limitations include potential dataset biases and the need for realworld validation. Excellent accuracy and interpretability are demonstrated by these
tree-based models, which reduce the risk of misdiagnosis and pave the way for ethical
deployment in healthcare. SHAP also ensures clear and trustworthy clinical decision
support.