Abstract:
Thyroid diseases are those disorders which are not easily detected because of their nondescript initial symptoms and complicated diagnosis. This study presents a systematic study that involves machine learning models to forecast thyroid disease at their early stages. A thyroid dataset was acquired on the UCI Machine Learning Repository UCI Machine Learning Repository from Kaggle was utilized, it’s containing 9172 patient records with 31 features and a binary target indicating the presence or absence of disease. Only 11 clinical features, along with 2 categorical features and 1 binary feature were taken to predict the thyroid disease. The data set had a significant class imbalance, so we use the Synthetic Minority Over-sampling Technique (SMOTE) was applied to ensure robust training. Seven different machine learning classifiers were trained and tested. Model performance was evaluated on a stratified hold-out test set using 1,000-iteration Non-Parametric bootstrap internal validation to obtain robust estimates and 95% confidence intervals for accuracy, sensitivity, specificity, precision, F1-score, and AUC. The results indicate that the Random Forest classifier provides superior sensitivity of 96.5% that making it a reliable tool for early screening.