Bangla Guruchandali Dosh Sentence Detection Using Machine Learning Techniques

Das, Rozanee Kanta; Tinni, Alaya Refat; Rinvee, Tanjina Zaman

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

Bangla Guruchandali Dosh Sentence Detection Using Machine Learning Techniques

Das, Rozanee Kanta; Tinni, Alaya Refat; Rinvee, Tanjina Zaman

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/8780

Date: 2022-01-04

Abstract:

Our life is surrounded with technology and we can’t live without this technology. Technology is upgrade day by day. Using Natural Language Processing (NLP) techniques computer can understand human language. Now a days, by the help of NLP researcher are interested to work with text document classification. Bangla text document classification, sentiment analysis etc. are interested topic for researcher. So, in our work we are going classify Guruchandali Dosh of Bangla sentences. In our Bangla language peoples are familiar with Sadhu and Colito form. Colito form is uses in our daily life and Sadhu form is used to written Bangla literature, novel, poems etc. When two forms of Bangla language mixed up in a sentence this is called Guruchandali Dosh. We our work we are going to detect the Guruchandali Dosh sentences using supervised learning techniques. In NLP work text document are easy to preprocess and translate. So, we collect Sadhu and Colito form of data from various Bangla text book, novel, poems and newspaper. Then we make our dataset changing the sentences using some Bangla grammatical rules. Finally, we are able to collects 1712 Bangla text data. We need to preprocess our data before using the machine learning algorithms. We preprocessed our text raw data by removing unwanted data, Stop Words etc. After that we use six classification techniques to classify Guruchandali Dosh sentences. In our work we use Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Extreme Gradient Boosting (XGB), Support Vector Machine (SVM), K-nearest neighbors (KNN) algorithms. All algorithms perform very well on our datasets. Among them Multinomial Naive Bayes (MNB) algorithm came with highest accuracy which is 85%. When we give input Bangla text data in our model, MNB model is able to predict the Guruchandali Dosh perfectly.

Show full item record