| dc.description.abstract |
Early identification of diabetes is important for controlling the disease and avoiding problems. To improve the Predictive data mining of Diabetes prediction based on a dataset from Kaggle that focuses on diabetes, In this study we propose an ensemble feature selection method (EFSM) which is then used to enhance accuracy diabetes prediction. We have applied seven models to solve this problem, including Random Forest, Decision Tree, Logistic Regression, XGBoost, AdaBoost (DT weak learner),K-Nearest Neighbors (KNN) and Support Vector Machines (SVM). Performing 5-fold cross-validation, XGBoost provided the best model with an accuracy of 98% which further showcases its superior pattern recognition abilities in our medical data. The novel EFSM is a technique that efficiently combines and scores features according to how often they are selected by multiple selection techniques, and thus will improve the performance of our models. These findings emphasize the potential of our method in diabetes prediction, yielding a reliable model that has the potential to assist with early diagnostics and patient management. |
en_US |