Exploring LLMs for Bangla Text Summarization: A T5-Based Abstractive Approach

Ornob, Kawshik Ahmed; Shuvo, Bibakananda Roy

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

dc.contributor.author	Ornob, Kawshik Ahmed
dc.contributor.author	Shuvo, Bibakananda Roy
dc.date.accessioned	2026-04-28T02:20:35Z
dc.date.available	2026-04-28T02:20:35Z
dc.date.issued	2025-05-14
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17106
dc.description	Project Report	en_US
dc.description.abstract	This project develops an abstractive summarization system for Bangla news articles using small-scale transformer models, specifically small MT5 (300M parameters) and BT5 Base (247M parameters), to generate long summaries (100– 200 tokens) for in-depth insights and short summaries (30–50 tokens) for quick updates. Addressing the challenge of information overload in Bangla media, the system processes a curated dataset of 10,000 articles from sources like Prothom Alo and BBC Bangla, covering diverse topics. The methodology includes web scraping, advanced preprocessing to handle Bangla’s linguistic complexities (e.g., morphology, dialects, Unicode issues), fine-tuning on a P100 GPU, and evaluation using ROUGE, BLEU, CER/WER, and human ratings by native speakers. Small MT5 achieved ROUGE-1 F1 scores of 0.410 (long) and 0.380 (short), outperforming BT5 Base (0.230 and 0.210), which struggled with overfitting. The system enhances information accessibility for journalists, educators, and the public, aligning with SDGs 4, 9, and 10. Contributions include an open-source dataset, codebase, and models, paving the way for future Bangla NLP research despite limitations in dialect coverage and computational resources.	en_US
dc.description.sponsorship	Daffodil International University	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Bangla Text Summarization	en_US
dc.subject	Abstractive Summarization	en_US
dc.subject	Natural Language Processing (NLP)	en_US
dc.subject	Low-Resource Language Processing	en_US
dc.subject	Transformer Models	en_US
dc.subject	Bangla News Dataset	en_US
dc.title	Exploring LLMs for Bangla Text Summarization: A T5-Based Abstractive Approach	en_US
dc.type	Other	en_US