Thalassemia Prediction Using Machine Learning Model

Rabbani, Md Golam; Zaman, Sharmila; Hemel, Reaz Uddin

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

Thalassemia Prediction Using Machine Learning Model

Rabbani, Md Golam; Zaman, Sharmila; Hemel, Reaz Uddin

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/8505

Date: 2022-01-13

Abstract:

Thalassemia is a genetic blood disease inherited from parents. It is the most common and concerning genetic disorder globally. Minor to major anemia and transfusion dependence is the main symptom of this disease. In South Asian countries like Bangladesh, every year there are many children born with thalassemia traits. Among various types of thalassemia, beta-thalassemia is the most severe one that causes weakness, serious anemia, shortness of breath, even failing organs like the kidney, heart. This study aims to classify thalassemia depending on the values of various hemoglobin (Hb) indices like Hb A, Hb B, Hb E, and Hb F collected from the data of a thalassemia center of Bangladesh. This work is to depict the epidemiological aspects of thalassemia from the data of the common people of all stages of Bangladesh. We applied various machine learning classifiers such as Logistic Regression (LR), Decision Tree, Support Vector Machine (SVM), Random Forest, and KNearest Neighbors (KNN), etc. to classify thalassemia. For evaluating the performance of the classifiers, we calculated accuracy, precision, recall and f1-score. We also plotted the ROC curve. From the ROC curve, it is observed that AUC (Area Under the Curve) has a big area. After conducting the study, we got the final result that concludes that among all the algorithms, the Random Forest and K-Nearest Neighbors (KNN) have shown the best accuracy which is 99.14%. The both precision and recall for the Random Forest is 99.00% and for KNN is 99.00% and 100% respectively.

Show full item record