A Method for Bengali Author Detection Using Supervised Classification Models

Hamid, Md. Abdul; Rahman, Md. Tanjil; Islam, Md. Fahim

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

A Method for Bengali Author Detection Using Supervised Classification Models

Hamid, Md. Abdul; Rahman, Md. Tanjil; Islam, Md. Fahim

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/10152

Date: 23-01-29

Abstract:

Text classification is an important area of study in the field of NLP. We live in a modern world where everyone values their intellectual property. Intellectual property includes digital written ideas, blogs, poems, novels, and posts, among other things. Evil people try to steal valuable intellectual property from others and claim it as their own or pirate these properties. To avoid these problems, we created several models based on the art-of-states Supervised method for determining authorship from a given Bangla text. Because our work is a multi-class classification, we can use it to determine who created articles, news, or messages. Authorship detection can be used to identify anonymous authors as well as detect plagiarism. This article focuses on categorizing five authors in the context of Bengali text. These five authors are well-known figures in Bengali literature and poetry. Humayun Ahmed, Rabindranath Tagore, Muhammad Zafar Iqbal, Kazi Nazrul Islam, and Sarat Chandra Chattopadhyay are among those honored. Data is being gathered from over 4500 paragraphs. For the experimental evaluation, a dataset is created. We preprocess Bengali text for training purposes. Logistic regression, naive Bayes, decision trees, SVM, Random Forest, XG-Boost, and KNN are among the seven supervised classification methods used. Our deep learning Bi-Lstm model outperforms the seven supervised models in terms of accuracy. By mentioning all models, the transformers-based model, Bert uncased model learns the context very well. Bi-Lstm was used in our experiment. Bi-Lstm and Bert uncased model provides the best experimental classification report in our experiment. The Bi-Lstm model loss function yields 0.3789 with a maximum accuracy of 88% and Bert base uncased F1-Score gives 91 % accuracy.

Show full item record