DSpace Repository

TriVashi: A Multi-Stage System for Dialectal Speech Translation, Integrating Identification, Transcription, and Synthesis for Under-Resourced

Show simple item record

dc.contributor.author Majumder, Salauddin
dc.contributor.author Surovi, Meheron Nesa
dc.date.accessioned 2026-04-05T09:25:14Z
dc.date.available 2026-04-05T09:25:14Z
dc.date.issued 2025-09-16
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16603
dc.description Project Report en_US
dc.description.abstract This increasing sophistication in natural language processing has in a paradoxical manner increased the digital divide and left thousands of lowresource languages and dialects underserved by technology. TriVashi is the first end-to-end, multi-stage speech-to-speech translator of the under-resourced dialect of the Noakhali, Sylheti, and Chittagong dialect of Bangladesh, where no integrated solution previously existed, and this research directly challenges this issue of digital linguistic inequality. The main outcome of this work is the establishment and the publication of a new, gender-balanced, 15,006-sample audio and parallel text corpus as an exemplary basis that is likely to stimulate future innovation. It uses a four-stage cascaded architecture, based on a proposed system identifies the dialect through a new visual-analytic system; this method re-frames the problem as an image classification challenge by transforming audio to Mel spectrograms and using a pre-trained DenseNet121- SVM classifier with the best 92.7%accuracy. After detection, the audio is sent to dialect specific Automatic Speech Recognition (ASR) and Neural Machine Translation (NMT) models. The experimental findings confirm the transformative effectiveness of transfer learning; with fine-tuning of large pretrained models (Whisper-Small and BanglaT5) on the curated dialectal data, the ASR Word Error Rate (WER) dropped monumentally, starting at more than 289.2 percent, down to as low as 3.0, and NMT BLEU scores surged to 56.3. The new state-of-the-art standards set in this work are accompanied by a strong and replicable methodological template that confirms the small data, big model paradigm as a viable way en_US
dc.description.sponsorship Daffodil International University en_US
dc.language.iso en_US en_US
dc.publisher Daffodil International University en_US
dc.subject Neural Machine Translation (NMT) en_US
dc.subject Transfer Learning en_US
dc.subject Automatic Speech Recognition (ASR en_US
dc.subject Speech-to-Speech Translation en_US
dc.subject Low-Resource Languages en_US
dc.title TriVashi: A Multi-Stage System for Dialectal Speech Translation, Integrating Identification, Transcription, and Synthesis for Under-Resourced en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account