Abstract:
Diabetes is a major threat for all over the world. It is rapidly getting worse day by day. It
is a big challenge to determine diabetes properly and give proper treatment at a right time.
Now in this era of technology many machine learning algorithms are used to develop
software to predict diabetes disease more accurately so that doctor can give patients proper
advice and medicine which can reduce the risk of death. The purpose of this paper is to
analyzing different Machine Learning algorithms for finding an efficient way to predict
diabetes. In this thesis, we analyze 10 different machine learning algorithms which are
Decision tree, Logistic regression, Multinomial Naïve Bayes, Gaussian Naïve Bayes,
KNN, Support vector Classifier, Random Forest, Gradient Boosting, AdaBoost and
Bagging by using a proper dataset. In our dataset there is 8 features and 2000 patients
information. Here we find out the correlation of each attribute by using standard data
mining technique. Dataset was preprocessed by using different preprocess method. We
apply percentage split,10-fold and 15-fold cross validation technique on individual 10
different algorithms. In the end of our implementation, we find the highest accuracy in
Decision tree which is 84.3% for percentage split,87% for 10-fold and 87.8% for 15-fold
cross validation. Machine learning technique take less time for predict disease.