Abstract:
The medical industry and machine learning now have a stronger connection because to
technological advancements. This work uses machine learning to forecast the prevalence of
diabetes, a worldwide illness. Predicting the disease at its early stages is the goal in order to
make treatment or management of the illness easier. I have worked with a dataset that has nine
characteristics and 100,000 occurrences. This dataset includes data on blood glucose level,
BMI, age, smoking history, heart disease, gender, and HbA1c level in addition to hypertension.
These are the principal markers of diabetes. I have used the Random Forest Classifier to
forecast the sickness. My study's findings have been contrasted with those of other machine
learning techniques, such as Decision Tree Classifier and Logistic Regression. After
comparison, it was discovered that, out of all of these methods, the Random Forest Classifier
had the greatest accuracy and AUC score. Using this approach, I have determined that the
accuracy is 95.67%, with an AUC score of 0.97. KNN has 88% accuracy, decision trees have
94%, and logistic regression has 88%. The findings demonstrate our study's remarkable success
in accurately predicting diabetes. My research indicates that there is great potential for
integrating computer science and medicine so that dangerous conditions like diabetes may be
identified early on.