Bangla language mode (Sadhu/Cholito) Classification

Parves, Abdul Bari; Rakib, Emranul Haque

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

Bangla language mode (Sadhu/Cholito) Classification

Parves, Abdul Bari; Rakib, Emranul Haque

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/3709

Date: 2019-05-05

Abstract:

This project addresses the problem of distinguishing between two form of Bangla language, namely Sadhubhasha and Cholitobhasha. The classifier would be beneficial for finding the right word choice for Bangla literature. The main vision of this project is to different the modern era’s early Bangla form of Sadhubhasha to the current form of Cholitobhasha. As far as we know there has been no single work done addressing this particular issue. From another perspective, only a few works have been done on “Bangla Language”. So, it has been difficult to conduct advance linguistic works on Bangla language like extracting information or summarizing. We had to face difficulties when collecting Bangla data due to the limited availability, but finally we have collected around total 100000 words dataset for this project. Among which 80% of the data is used for training and rest 20% is test data. Machine learning algorithms Random forest, Naïve Bayes, Support Vector Machine, Knearest neighbor and Decision tree are applied to classify the language and the Term Frequency-Inverse Document Frequency and Bag of Words are used for the numerical representation. With these classifiers 91% to 99.5% accuracy is observed. The promising outcome of this project is, "sadhu and cholito Language classifier" can be used as the first step on that ladder from where others will be influenced to do further research on Bangla language.

Show full item record