dc.description.abstract |
Excess blood glucose levels are indicative of diabetes mellitus (DM), a chronic
metabolic disease. Improving patient outcomes and minimizing complications
from diabetes need early detection and care of the condition. In this work, we
suggest a dataset-based machine learning method for the early identification of
diabetes. The dataset gets divided into training and testing sets, missing value
management, and feature scaling are among the preparatory procedures that it
goes through. After then, each algorithm is trained on the data that has been
processed, and cross-validation methods are used to evaluate its performance. We
investigate the effectiveness of various machine learning strategies algorithms
perform in categorizing people as either diabetes or non-diabetic depending on
their clinical and demographic characteristics using Random Forest (RF) and
Extreme Gradient Boosting (XGB), Logistic Regression (LR) and Gradient
Boosting (GB). Using an ensemble approach called Random Forest, many
decision trees are combined to decrease overfitting and increase forecast accuracy.
Another ensemble approach, gradient boosting, improves model performance by
building trees one after the other to fix mistakes in the prior ones. The statistical
model known as logistic regression is useful for classification jobs because it
calculates the likelihood of a binary result. The Support Vector Classifier builds
hyperplanes to divide various classes and is well-known for its efficiency in high dimensional domains. Random Forest method performed best with an 85%
accuracy and f1 score 0.86. The suggested machine learning approach exhibits
encouraging outcomes for diabetes early detection, which may help medical
professionals identify those who are at risk. |
en_US |