A Method for Bengali Author Detection Using Supervised Classification Models

Hamid, Md. Abdul; Rahman, Md. Tanjil; Islam, Md. Fahim

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

dc.contributor.author	Hamid, Md. Abdul
dc.contributor.author	Rahman, Md. Tanjil
dc.contributor.author	Islam, Md. Fahim
dc.date.accessioned	2023-04-05T08:24:41Z
dc.date.available	2023-04-05T08:24:41Z
dc.date.issued	23-01-29
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/10152
dc.description.abstract	Text classification is an important area of study in the field of NLP. We live in a modern world where everyone values their intellectual property. Intellectual property includes digital written ideas, blogs, poems, novels, and posts, among other things. Evil people try to steal valuable intellectual property from others and claim it as their own or pirate these properties. To avoid these problems, we created several models based on the art-of-states Supervised method for determining authorship from a given Bangla text. Because our work is a multi-class classification, we can use it to determine who created articles, news, or messages. Authorship detection can be used to identify anonymous authors as well as detect plagiarism. This article focuses on categorizing five authors in the context of Bengali text. These five authors are well-known figures in Bengali literature and poetry. Humayun Ahmed, Rabindranath Tagore, Muhammad Zafar Iqbal, Kazi Nazrul Islam, and Sarat Chandra Chattopadhyay are among those honored. Data is being gathered from over 4500 paragraphs. For the experimental evaluation, a dataset is created. We preprocess Bengali text for training purposes. Logistic regression, naive Bayes, decision trees, SVM, Random Forest, XG-Boost, and KNN are among the seven supervised classification methods used. Our deep learning Bi-Lstm model outperforms the seven supervised models in terms of accuracy. By mentioning all models, the transformers-based model, Bert uncased model learns the context very well. Bi-Lstm was used in our experiment. Bi-Lstm and Bert uncased model provides the best experimental classification report in our experiment. The Bi-Lstm model loss function yields 0.3789 with a maximum accuracy of 88% and Bert base uncased F1-Score gives 91 % accuracy.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Bengali literature	en_US
dc.subject	Classification	en_US
dc.subject	Logistic regression	en_US
dc.title	A Method for Bengali Author Detection Using Supervised Classification Models	en_US
dc.type	Other	en_US