DSpace Repository

Exploring LLMs for Bangla Text Summarization: A T5-Based Abstractive Approach

Show simple item record

dc.contributor.author Ornob, Kawshik Ahmed
dc.contributor.author Shuvo, Bibakananda Roy
dc.date.accessioned 2026-04-28T02:20:35Z
dc.date.available 2026-04-28T02:20:35Z
dc.date.issued 2025-05-14
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17106
dc.description Project Report en_US
dc.description.abstract This project develops an abstractive summarization system for Bangla news articles using small-scale transformer models, specifically small MT5 (300M parameters) and BT5 Base (247M parameters), to generate long summaries (100– 200 tokens) for in-depth insights and short summaries (30–50 tokens) for quick updates. Addressing the challenge of information overload in Bangla media, the system processes a curated dataset of 10,000 articles from sources like Prothom Alo and BBC Bangla, covering diverse topics. The methodology includes web scraping, advanced preprocessing to handle Bangla’s linguistic complexities (e.g., morphology, dialects, Unicode issues), fine-tuning on a P100 GPU, and evaluation using ROUGE, BLEU, CER/WER, and human ratings by native speakers. Small MT5 achieved ROUGE-1 F1 scores of 0.410 (long) and 0.380 (short), outperforming BT5 Base (0.230 and 0.210), which struggled with overfitting. The system enhances information accessibility for journalists, educators, and the public, aligning with SDGs 4, 9, and 10. Contributions include an open-source dataset, codebase, and models, paving the way for future Bangla NLP research despite limitations in dialect coverage and computational resources. en_US
dc.description.sponsorship Daffodil International University en_US
dc.language.iso en_US en_US
dc.publisher Daffodil International University en_US
dc.subject Bangla Text Summarization en_US
dc.subject Abstractive Summarization en_US
dc.subject Natural Language Processing (NLP) en_US
dc.subject Low-Resource Language Processing en_US
dc.subject Transformer Models en_US
dc.subject Bangla News Dataset en_US
dc.title Exploring LLMs for Bangla Text Summarization: A T5-Based Abstractive Approach en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account