DSpace Repository

Adapting Whisper AI for Multilingual Speech-To-Text Conversion in Bangladeshi Ethnic Languages

Show simple item record

dc.contributor.author Yeiad, Kabid
dc.contributor.author Jim, Jannatul Ferdushi
dc.date.accessioned 2025-08-26T09:56:34Z
dc.date.available 2025-08-26T09:56:34Z
dc.date.issued 2024-07-24
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/13995
dc.description Project report en_US
dc.description.abstract This research shows how a full speech-to-text and translation system was made to turn sounds from five different ethnic languages into Bangla. These languages are Malo, Bawm, Santali, Marma, and Garo. The study looks at how well different neural network models, such as Bi-LSTM, CNN Seq2Seq, GRU Seq2Seq, Seq2Seq, and Transformer models, handle difficult jobs like translating and transcribing languages. To get the data, 20 native speakers of each language had to record 500 unique words. This made a strong sample that could be used for training and testing. Noise reduction, normalization, and segmentation were some of the preprocessing steps that made sure the inputs were of good quality. Adding more data made the model more stable. In real life, the Transformer model did better than others, with a 97% success rate in training and a 96% success rate in confirmation. The GRU Seq2Seq model also did well, finding a good balance between accuracy and speed. However, the CNN Seq2Seq model had a hard time. With high accuracy and low word error rates across all languages, the Whisper model did great at transcription jobs. The thorough test that used measures like accuracy, loss, validation accuracy, and validation loss showed that the Transformer model was the best at capturing long-range dependencies and context, which made it the best for translation. The Whisper model's reliable performance in transcription jobs is shown by how well it always does. This method has a big effect on people's lives because it lets them talk to each other in their native languages, brings people together, and protects cultural heritage. It gives minority language users more power by giving them access to basic services in their own languages. This makes it easier for them to be a part of society and improves their quality of life. This study builds a strong base for future improvements in speech-to-text and translation tools, which will help more people use languages and keep their cultures alive. en_US
dc.description.sponsorship DIU en_US
dc.language.iso en_US en_US
dc.publisher Daffodil International University en_US
dc.subject Language Technology en_US
dc.subject Natural Language Processing (NLP) en_US
dc.subject Deep Learning en_US
dc.subject Ethnic Communities en_US
dc.title Adapting Whisper AI for Multilingual Speech-To-Text Conversion in Bangladeshi Ethnic Languages en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account