Abstract:
Diabetes Mellitus is one of the most vastly dispersed, lethal and life threatening ailments
not only around the globe but also in Bangladesh. It deteriorates the health condition
gradually when the human body can not manufacture adequate insulin or could not
acknowledge it in a decent fashion, which results in anomalously increased blood sugar
levels. Countless complexities including high mortality, damages of numerous organs
occur if the patients continue to live without medical treatment. So, identification of this
illness in the premature phase and timely medical therapy can retain more humankind
from serious injuries. The astonishing advancements in health sciences have contributed
to a noteworthy volume of data. Machine learning algorithms have extensively gained
popularity in medical science to diagnose and predict the likelihood of this sickness using
these tons of raw data. The intention of this research work is to make a side by side
analysis of multiple machine learning classifiers and their results of prognosis to this
deadly disease beforehand. Decision Tree, Logistic Regression, Random Forest, Support
Vector Machine, K-Nearest Neighbours and Naive Bayes have been applied in supervised
circumstances to predict the possibility of the disease. The fresh dataset at hand is
imbalanced and has been accumulated from UCI repository and having sixteen
dimensions and one outcome class. That’s why pre-processing tasks like missing or null
value replacement, label encoding, importance feature selection, SMOTE resampling
methodology to balance class variables, have been conducted on the data. Scikit Learn, a
python free module has been used for analysing and visualizing the experiments. Lastly,
outcomes of the algorithms have been compared to put a verdict that the Random Forest
classifier outperforms others with 98.38% accuracy level.