DSpace Repository

Incorporating Supervised Learning Algorithms with NLP Techniques to Classify Bengali Language Forms

Show simple item record

dc.contributor.author Parves, Abdul Bari
dc.contributor.author Imran, Abdullah Al
dc.contributor.author Rahman, Md. Riazur
dc.date.accessioned 2021-08-01T10:27:44Z
dc.date.available 2021-08-01T10:27:44Z
dc.date.issued 2020-01-10
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/5943
dc.description.abstract Every language has its own root, form, and grammar, and so does Bengali. Bengali language has two core forms: "Sadhu-bhasha" and "Cholito-bhasha" which have been widely used from regular communication to literary publications. At present, Sadhu-bhasha can be only found in old books and literary publications, whereas Cholito-bhasha is mostly used everywhere. However, so many Bengali linguists are still researching on these two forms to preserve its root, understand and develop Bengali, and also extract knowledge from the historical publications which were mainly written in Sadhu-bhasha. Unfortunately, till now they do not have any digital tool that can assist their research by automatically identifying these core forms of Bengali from the large archive of Bengali literature. This study aims to build such an automatic intelligent system that can accurately identify these two language forms by harnessing the power of Natural Language Processing (NLP). In this study, we have applied advanced NLP techniques and six Supervised learning algorithms to classify "Sadhu-bhasha" and "Cholito-bhasha" from text corpora. Results of this study show that all the six models yielded very promising results, however, the Multinomial Naive Bayes outperformed all the models with 99.5% accuracy, 99.0% precision, 100% recall, 0.995 AUC score and, 0.995 F1 score. Additionally, this study also performs qualitative analysis using t-SNE algorithm to visualize the difference between Sadhu-bhasha and Cholito-bhasha. en_US
dc.language.iso en_US en_US
dc.publisher Scopus en_US
dc.subject Supervised learning algorithms en_US
dc.subject Sadhu-bhasha" and "Cholito-bhasha en_US
dc.title Incorporating Supervised Learning Algorithms with NLP Techniques to Classify Bengali Language Forms en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account

Statistics