Approach of Different Classification Algorithms to Compare in N-gram Feature between Bangla Good and Bad Text Discourses

Bitto, Abu Kowshir; Bijoy, Md. Hasan Imam; Khan, Saima; Mahmud, Imran; Biplob, Khalid Been Badruzzaman

DSpace Home
→
DIU Faculty Publication
→
Articles
→
View Item

dc.contributor.author	Bitto, Abu Kowshir
dc.contributor.author	Bijoy, Md. Hasan Imam
dc.contributor.author	Khan, Saima
dc.contributor.author	Mahmud, Imran
dc.contributor.author	Biplob, Khalid Been Badruzzaman
dc.date.accessioned	2024-04-28T09:13:18Z
dc.date.available	2024-04-28T09:13:18Z
dc.date.issued	2023-05-31
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/12172
dc.description.abstract	Bangla Natural Language Processing (BNLP) is a newish challenge in Artificial Intelligence. With the rapid expansion of the Bangla language, it is now adopted on a variety of platforms, including social media, communication platforms, news media, and so on. The classification of text documents becomes an important factor in resolving the challenge of information organization and knowledge management. This study uses five supervised classification methods to explore the categorization of Bangla text discourse using N-gram (unigram, bigram, and trigram) features. Bangla text discourse is collected from different platforms such as social media, personal Bangla blogs, and people's utterances in order to accomplish the research goal. After collecting data, the most difficult part of the Bangla language preprocessing is completed, which includes adding contractions, removing punctuations, encoding, and a variety of other operations. For this study, 1499 text documents were initially used, with 1459 Bangla text discourses being used after preprocessing. To convert the text into a token, N-gram feature methods utilizing TF-IDF-Vectorizer are used. During the experiment phase, unigram, bigram, and trigram feature techniques are used to apply Logistic Regression (LR), Decision Tree Classifier (DTC), Random Forest (RF), Multinomial Naive Bayes (MNB), and K-Nearest Neighbors (KNN) models to the dataset. In the unigram and bigram features, Multinomial Naive Bayes (MNB) outperformed all other classifiers, with the highest accuracy of 89.31% and 86.94%, respectively. The trigram feature of K-Nearest Neighbors (KNN) achieves a maximum accuracy of 84.25%, and the proposed model can classify the Bangla text document as Good or Bad Discourse.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Springer	en_US
dc.subject	Natural language	en_US
dc.subject	Bangla languages	en_US
dc.subject	Artificial intelligence	en_US
dc.subject	Classification	en_US
dc.subject	Algorithms	en_US
dc.title	Approach of Different Classification Algorithms to Compare in N-gram Feature between Bangla Good and Bad Text Discourses	en_US
dc.type	Article	en_US