Abstract:
The aim of this project is to predict hypothyroidism and hyperthyroidism with the help of multiple machine learning (ML) methods. The research employs a clinical dataset that comes with vital characteristics of the thyroid including Thyroid Stimulating Hormone (TSH), Triiodothyronine(T3), Thyroxine (TT4), and Free Thyroxine Index (FTI) and demographic characteristic of the patients including age and gender. The data goes through several preprocessing stages, such as dealing with missing and encoding discrete values; and normalizing numerical scores to achieve successful performance of the model. Various models of machine learning such as Decision Trees, Random Forest, Support Vector Machines (SVM), Naive Bayes, and Gradient Boosting are applied and checked according to the accuracy, precision, recall, and F1-score. Hyperparameter optimisation is also done to enhance the effectiveness of the model. The findings indicate that the overall predictive accuracy and balance between recall and precision of any methods indicate a higher performance of the Random Forest, so prediction of thyroid disorders is possible using it. This study exemplifies the significance of preprocessing data and selection of machine learning models in the healthcare use case and provides understanding of the potential to utilize the concept of machine learning in clinical settings and utilize it as a preliminary method of the thyroid disease diagnosis.