Abstract:
Parkinson's Disease, a progressive neurodegenerative disorder, is challenging to diagnose at its early stages due to symptom overlap with other conditions and the subtle onset of clinical features. Leveraging vocal impairments, which affect up to 90% of PD patients even during early stages, this research demonstrates the potential of speech analysis as a non-invasive, cost-effective diagnostic tool. A systematic workflow was employed, involving data preprocessing, exploratory data analysis, feature selection, and the application of five machine learning models:Logistic Regression, Random Forest Classifier, Support Vector Classifier (SVC), Gradient Boosting Classifier, and XGBoost Classifier. These models were trained and evaluated on a dataset containing phonetic features extracted from voice recordings, with performance measured through accuracy, precision, recall, F1 score, and AUC. The findings highlight the efficacy of ensemble learning models, particularly Gradient Boosting and XGBoost, in accurately classifying PD cases. These results validate the use of vocal biomarkers and machine learning in advancing diagnostic precision for neurodegenerative diseases.