DSpace Repository

Approach of Different Classification Algorithms to Compare in N-gram Feature between Bangla Good and Bad Text Discourses

Show simple item record

dc.contributor.author Bitto, Abu Kowshir
dc.contributor.author Bijoy, Md. Hasan Imam
dc.contributor.author Khan, Saima
dc.contributor.author Mahmud, Imran
dc.contributor.author Biplob, Khalid Been Badruzzaman
dc.date.accessioned 2024-04-28T09:13:18Z
dc.date.available 2024-04-28T09:13:18Z
dc.date.issued 2023-05-31
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/12172
dc.description.abstract Bangla Natural Language Processing (BNLP) is a newish challenge in Artificial Intelligence. With the rapid expansion of the Bangla language, it is now adopted on a variety of platforms, including social media, communication platforms, news media, and so on. The classification of text documents becomes an important factor in resolving the challenge of information organization and knowledge management. This study uses five supervised classification methods to explore the categorization of Bangla text discourse using N-gram (unigram, bigram, and trigram) features. Bangla text discourse is collected from different platforms such as social media, personal Bangla blogs, and people's utterances in order to accomplish the research goal. After collecting data, the most difficult part of the Bangla language preprocessing is completed, which includes adding contractions, removing punctuations, encoding, and a variety of other operations. For this study, 1499 text documents were initially used, with 1459 Bangla text discourses being used after preprocessing. To convert the text into a token, N-gram feature methods utilizing TF-IDF-Vectorizer are used. During the experiment phase, unigram, bigram, and trigram feature techniques are used to apply Logistic Regression (LR), Decision Tree Classifier (DTC), Random Forest (RF), Multinomial Naive Bayes (MNB), and K-Nearest Neighbors (KNN) models to the dataset. In the unigram and bigram features, Multinomial Naive Bayes (MNB) outperformed all other classifiers, with the highest accuracy of 89.31% and 86.94%, respectively. The trigram feature of K-Nearest Neighbors (KNN) achieves a maximum accuracy of 84.25%, and the proposed model can classify the Bangla text document as Good or Bad Discourse. en_US
dc.language.iso en_US en_US
dc.publisher Springer en_US
dc.subject Natural language en_US
dc.subject Bangla languages en_US
dc.subject Artificial intelligence en_US
dc.subject Classification en_US
dc.subject Algorithms en_US
dc.title Approach of Different Classification Algorithms to Compare in N-gram Feature between Bangla Good and Bad Text Discourses en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account

Statistics