Speech-Based Classification of Bengali Regional Accents using  Machine Learning

Jahan, Naila Nushrat; Shomrat, Salman Mahmud

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

dc.contributor.author	Jahan, Naila Nushrat
dc.contributor.author	Shomrat, Salman Mahmud
dc.date.accessioned	2026-06-25T03:46:58Z
dc.date.available	2026-06-25T03:46:58Z
dc.date.issued	2025-01-13
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17425
dc.description	Project Report	en_US
dc.description.abstract	This research investigates the classification of Bengali regional accents using speech data and machine learning techniques. Accurate recognition of regional accents plays a pivotal role in improving natural language processing systems in linguistically diverse regions such as Bangladesh. Speech data was collected from various regions, including 580 audio samples from Chandpur, 535 from General Bengali, 484 from Bogura, 456 from Chittagong, 420 from Sylhet, 413 from Barishal, and 28 from other areas. The dataset was preprocessed to extract key speech features, which were then used as inputs for machine learning models.Four machine learning algorithms were applied and evaluated: Random Forest, Decision Tree, K-Nearest Neighbors, and Logistic Regression. Among these, the Random Forest model demonstrated the highest accuracy, achieving 98.12%. The Decision Tree model followed with 87.67%, while K-Nearest Neighbors and Logistic Regression attained 75.17% and 65.92%, respectively. These findings highlight the superiority of ensemble methods such as Random Forest in managing complex and diverse datasets. The study also addresses the challenges in accent classification, particularly the variability in speech patterns and the limited data availability for less-represented regions. The inclusion of the "others" category further underlines the necessity of more comprehensive and balanced datasets to improve model generalizability. This work significantly contributes to the fields of computational linguistics and speech recognition, showcasing the effectiveness of machine learning in accent classification. The exceptional performance of the Random Forest model underscores its potential for real-world applications, such as automated transcription, accent-based recommendations, and language learning systems. Future work may focus on enhancing the dataset and leveraging advanced deep learning techniques to further improve accuracy and performance.	en_US
dc.description.sponsorship	Daffodil International University	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Bengali Regional	en_US
dc.subject	Speech Recognition	en_US
dc.subject	Machine Learning	en_US
dc.subject	Accent Recognition	en_US
dc.subject	Computational Linguistics	en_US
dc.subject	Natural Language Processing (NLP)	en_US
dc.title	Speech-Based Classification of Bengali Regional Accents using Machine Learning	en_US
dc.type	Other	en_US