TriVashi: A Multi-Stage System for Dialectal Speech Translation, Integrating Identification, Transcription, and Synthesis for Under-Resourced

Majumder, Salauddin; Surovi, Meheron Nesa

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

dc.contributor.author	Majumder, Salauddin
dc.contributor.author	Surovi, Meheron Nesa
dc.date.accessioned	2026-04-05T09:25:14Z
dc.date.available	2026-04-05T09:25:14Z
dc.date.issued	2025-09-16
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16603
dc.description	Project Report	en_US
dc.description.abstract	This increasing sophistication in natural language processing has in a paradoxical manner increased the digital divide and left thousands of lowresource languages and dialects underserved by technology. TriVashi is the first end-to-end, multi-stage speech-to-speech translator of the under-resourced dialect of the Noakhali, Sylheti, and Chittagong dialect of Bangladesh, where no integrated solution previously existed, and this research directly challenges this issue of digital linguistic inequality. The main outcome of this work is the establishment and the publication of a new, gender-balanced, 15,006-sample audio and parallel text corpus as an exemplary basis that is likely to stimulate future innovation. It uses a four-stage cascaded architecture, based on a proposed system identifies the dialect through a new visual-analytic system; this method re-frames the problem as an image classification challenge by transforming audio to Mel spectrograms and using a pre-trained DenseNet121- SVM classifier with the best 92.7%accuracy. After detection, the audio is sent to dialect specific Automatic Speech Recognition (ASR) and Neural Machine Translation (NMT) models. The experimental findings confirm the transformative effectiveness of transfer learning; with fine-tuning of large pretrained models (Whisper-Small and BanglaT5) on the curated dialectal data, the ASR Word Error Rate (WER) dropped monumentally, starting at more than 289.2 percent, down to as low as 3.0, and NMT BLEU scores surged to 56.3. The new state-of-the-art standards set in this work are accompanied by a strong and replicable methodological template that confirms the small data, big model paradigm as a viable way	en_US
dc.description.sponsorship	Daffodil International University	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Neural Machine Translation (NMT)	en_US
dc.subject	Transfer Learning	en_US
dc.subject	Automatic Speech Recognition (ASR	en_US
dc.subject	Speech-to-Speech Translation	en_US
dc.subject	Low-Resource Languages	en_US
dc.title	TriVashi: A Multi-Stage System for Dialectal Speech Translation, Integrating Identification, Transcription, and Synthesis for Under-Resourced	en_US
dc.type	Other	en_US