Parkinson’s Disease Detection Using Machine Learning: A Comparative Study of Classification Algorithms

Rifat, Samiul Haque

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

Parkinson’s Disease Detection Using Machine Learning: A Comparative Study of Classification Algorithms

Rifat, Samiul Haque

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16941

Date: 2025-05-14

Abstract:

Parkinson’s disease (PD) is a neurodegenerative movement disorder resulting from the loss of dopamine neurons that causes tremor, bradykinesia, and rigidity as its cardinal motor symptoms, dramatically affecting patient quality of life. Early and reliable diagnosis of PD is important for its successful treatment and control. In this study, we provide a reference on the comparison of machine learning models to PD detection based on the comprehensive analysis of a dataset on demographic, clinical and voice features. The research report compares the performance of six classifiers (MNB, Logistic Regression, Random Forest Classifier, GNB, Decision Tree Classifier, and SVC) on the classification of normal and PD classes. From the results of our experiments, the best test accuracy of 90.07% was achieved by the Random Forest Classifier and the next best of 87.23% was achieved by the Decision Tree Classifier. Logistic Regression achieves the bestperformed with 79.91% of test accuracy, and Gaussian Naïve Bayes yields 76.12%. On the other hand, Multinomial Naïve Bayes and SVC achieve low accuracies of 68.56% and 62.17% , respectively. It is worth mentioning that Random Forest and Decision Tree models are able to overfit as they capture patterns within data perfectively (the training accuracy for all are 100%), whilst the Scikit learn baseline model achieved almost the same accuracy for the test dataset. But this does have me wondering about over-fitting (especially with Decision Trees). The present work emphasizes the necessity of using suitable models according to the property of the data and the needs of PD detection tasks. The Random Forest, for instance, is a model that has already found applications in this context and performed well, however ensembles like these are more complex and computationally expensive than simpler models such as the Logistic Regression. In addition, the results also highlight the necessity of further data preprocessing (feature scaling and hyperparameter tuning) to improve the convergence and generalization of learning models. By furthering my topic of machine learning in the context of neurodegenerative disease diagnosis, this research provides valuable insights into avenues for enhanced early detection and tailored treatments for Parkinson’s disease.