Multiclass Classification of Bengali Newspaper Article Using Transformer & Deep Learning Approaches

Habib, Md. Ahsan

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
M.SC. in CSE
→
Thesis
→
View Item

Multiclass Classification of Bengali Newspaper Article Using Transformer & Deep Learning Approaches

Habib, Md. Ahsan

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16643

Date: 2025-01-14

Abstract:

In this paper we introduce a large scale, structured dataset of Bangla news articles with 320k instances under several predefined classes (Science & Technology, International, National, Sports, Entertainment, Economy, Politics and Education) that aims to advance Bengali Natural Language Processing (NLP). Objective — to solve text classification problem for Bangla contents. A range of deep-learning models has been used for classifying the articles, where Bangla-BERT—a transformer-based model had attained an accuracy: 92% which was better than others. Other architectures (GRU, LSTM, CNN and a Hybrid Model) were also implemented and tested but Bangla-BERT outperformed with the highest accuracy. The present holistic dataset and the resulting insights on model performance allow a significant addition to available resources with Bangla NLP and an accurate benchmark for future works in this area. The implications of this work reach academics and industry; the Bangladeshi National Newspaper Organizations can use these models for efficient article categorization, and the natural language processing researchers are using an available dataset with insights on model effectiveness for Bangla text classification. This work represents a small step towards bridging the gap for NLP resources of Bengali language, and may pave the way for quantitative progress in automated language processing for Bangla.