| dc.description.abstract |
Abstractive text summarization is a critical challenge in natural language processing
(NLP), especially for low-resource languages like Bangla, where data scarcity and
weak multilingual adaptation limit progress. This thesis presents a comparative study
of three approaches: fine-tuned BanglaT5, fine-tuned mT5, and prompt-engineered
GPT. The Bengali Abstractive News Summarization (BANS) dataset was employed,
with preprocessing steps such as normalization, tokenization, padding, and truncation
to ensure consistency. BanglaT5 and mT5 were fine-tuned using AdamW with crossentropy loss, while GPT was evaluated through zero-shot prompts. Performance was
measured with BERTScore and human evaluation by three annotators, who rated
outputs on Relevance, Coherence, and Conciseness (1–10 scale). Automatic results
show that BanglaT5 achieved the highest BERTScore (F1 0.817% in Bangla
embeddings; 0.957% in English embeddings), outperforming mT5 (F1 0.551% in
Bangla; 0.765% in English). Human evaluation revealed that GPT consistently scored
higher in Relevance 85% and Coherence 84%, while BanglaT5 was rated better for
Conciseness 88%, reflecting its ability to produce shorter yet meaningful summaries.
These findings highlight the trade-offs between language-specific and general-purpose
LLMs: BanglaT5 excels in conciseness and precision, GPT in fluency and relevance,
and mT5 underperforms across dimensions. The study concludes that a hybrid
approach, combining the precision of BanglaT5 with the fluency of GPT, can
significantly advance Bangla summarization and contribute to more inclusive NLP
tools for low-resource languages. |
en_US |