Classification of Bangladeshi Regional Language Using Machine Learning And Deep Learning

Mia, Yeasin; Tanvir, Sakhaoyat Ullah

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

dc.contributor.author	Mia, Yeasin
dc.contributor.author	Tanvir, Sakhaoyat Ullah
dc.date.accessioned	2025-09-14T07:24:40Z
dc.date.available	2025-09-14T07:24:40Z
dc.date.issued	2024-07-24
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/14490
dc.description	Project Report	en_US
dc.description.abstract	The goal of this project is to reliably detect linguistic variants and dialects by classifying regional languages spoken in Bangladesh using machine learning (ML) and deep learning (DL) techniques. The dataset has 3000 entries, with a sufficient representation of each of the five major regional languages (Chattogram: 655, Dhaka: 608, Rangpur: 621, Sylhet: 553, Noakhali: 562). The entries are distributed among these five major languages. The procedure of collecting data included developing a survey form, obtaining and preparing text samples, and cleaning data using natural language processing methods. Neural Bayes (BNB), Support Vector Machines (SVM), Random Forest, Bi-directional Long ShortTerm Memory (Bi-LSTM), Logistic Regression (LR), and Convolutional Neural Networks (CNN) were among the ML and DL models that were assessed. According to the results, DL models (Bi-LSTM: 95.24%, CNN: 98.48%) are much better at classifying regional languages than classic ML methods (Random Forest: 70.00%, SVM: 67.78%, LR: 66.22%, BNB: 64.44%). All in all, this study highlights how well DL methods capture complex linguistic patterns that are essential for problems involving the classification of regional languages. It highlights the importance of Bangladesh's language diversity from a cultural standpoint and promotes ethical research methods to help preserve languages and promote social inclusion. Prospective avenues for investigation encompass augmenting the intricacy of the model through syntactic and semantic evaluations, in addition to examining the wider sociocultural implications of language categorization technology.	en_US
dc.description.sponsorship	DIU	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Deep Learning	en_US
dc.subject	Regional Language Classification	en_US
dc.subject	Dialect Identification	en_US
dc.subject	Speech Processing	en_US
dc.title	Classification of Bangladeshi Regional Language Using Machine Learning And Deep Learning	en_US
dc.type	Other	en_US