Abstract:
Technological advancements have led to a closer relationship between the medical
industry and machine learning. This study employs machine learning to predict the
occurrence of diabetes, a global disease. The aim is to identify the disease in its initial
stages to facilitate easier treatment or management of the condition. I have utilized a
dataset containing nine features and 100,000 instances. This dataset contains
information on hypertension, blood glucose level, BMI, age, smoking history, heart
disease, gender, and HbA1c level. These are the main indicators of diabetes. I utilized
the Random Forest Classifier to predict the illness. The results of my research have been
compared to those from other machine learning methods, including Decision Tree
Classifier, Logistic Regression, and KNN. Among these techniques, the Random Forest
Classifier exhibited the highest accuracy (95.67%) and AUC score (0.97), confirming its
robustness for diabetes prediction. While the findings are promising, the research also
identifies important gaps: limited dataset diversity, lack of interpretability of models,
minimal clinical integration, and low accessibility for end-users. Recognizing these gaps
highlights opportunities for future work, such as incorporating more inclusive datasets,
developing explainable AI approaches, and building user-friendly mobile or clinical
applications. The study therefore demonstrates significant potential for the integration of computer
science and medicine to enable early identification of hazardous conditions. Diabetes is
one of the most widespread chronic diseases in the world today. Millions of people are
affected by it, and the numbers continue to grow each year. What makes diabetes
especially dangerous is that many people do not know they have it until serious health
problems arise. Early detection is therefore extremely important, as it gives patients a
better chance to manage or even prevent severe complications. Recent advances in technology, especially in the field of machine learning (ML), have
opened new opportunities for healthcare. By analyzing large amounts of medical data,
machine learning algorithms can identify patterns that are not always visible to doctors
during routine checkups. This study focuses on using machine learning to predict the
occurrence of diabetes, with the hope of assisting healthcare providers and patients in
catching the disease earlier.