Adapting Whisper AI for Multilingual Speech-To-Text Conversion  in Bangladeshi Ethnic Languages

Yeiad, Kabid; Jim, Jannatul Ferdushi

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

dc.contributor.author	Yeiad, Kabid
dc.contributor.author	Jim, Jannatul Ferdushi
dc.date.accessioned	2025-08-26T09:56:34Z
dc.date.available	2025-08-26T09:56:34Z
dc.date.issued	2024-07-24
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/13995
dc.description	Project report	en_US
dc.description.abstract	This research shows how a full speech-to-text and translation system was made to turn sounds from five different ethnic languages into Bangla. These languages are Malo, Bawm, Santali, Marma, and Garo. The study looks at how well different neural network models, such as Bi-LSTM, CNN Seq2Seq, GRU Seq2Seq, Seq2Seq, and Transformer models, handle difficult jobs like translating and transcribing languages. To get the data, 20 native speakers of each language had to record 500 unique words. This made a strong sample that could be used for training and testing. Noise reduction, normalization, and segmentation were some of the preprocessing steps that made sure the inputs were of good quality. Adding more data made the model more stable. In real life, the Transformer model did better than others, with a 97% success rate in training and a 96% success rate in confirmation. The GRU Seq2Seq model also did well, finding a good balance between accuracy and speed. However, the CNN Seq2Seq model had a hard time. With high accuracy and low word error rates across all languages, the Whisper model did great at transcription jobs. The thorough test that used measures like accuracy, loss, validation accuracy, and validation loss showed that the Transformer model was the best at capturing long-range dependencies and context, which made it the best for translation. The Whisper model's reliable performance in transcription jobs is shown by how well it always does. This method has a big effect on people's lives because it lets them talk to each other in their native languages, brings people together, and protects cultural heritage. It gives minority language users more power by giving them access to basic services in their own languages. This makes it easier for them to be a part of society and improves their quality of life. This study builds a strong base for future improvements in speech-to-text and translation tools, which will help more people use languages and keep their cultures alive.	en_US
dc.description.sponsorship	DIU	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Language Technology	en_US
dc.subject	Natural Language Processing (NLP)	en_US
dc.subject	Deep Learning	en_US
dc.subject	Ethnic Communities	en_US
dc.title	Adapting Whisper AI for Multilingual Speech-To-Text Conversion in Bangladeshi Ethnic Languages	en_US
dc.type	Other	en_US