Abstract:
Our life is surrounded with technology and we can’t live without this technology. Technology is
upgrade day by day. Using Natural Language Processing (NLP) techniques computer can
understand human language. Now a days, by the help of NLP researcher are interested to work
with text document classification. Bangla text document classification, sentiment analysis etc. are
interested topic for researcher. So, in our work we are going classify Guruchandali Dosh of Bangla
sentences. In our Bangla language peoples are familiar with Sadhu and Colito form. Colito form
is uses in our daily life and Sadhu form is used to written Bangla literature, novel, poems etc. When
two forms of Bangla language mixed up in a sentence this is called Guruchandali Dosh. We our
work we are going to detect the Guruchandali Dosh sentences using supervised learning
techniques. In NLP work text document are easy to preprocess and translate. So, we collect Sadhu
and Colito form of data from various Bangla text book, novel, poems and newspaper. Then we
make our dataset changing the sentences using some Bangla grammatical rules. Finally, we are
able to collects 1712 Bangla text data. We need to preprocess our data before using the machine
learning algorithms. We preprocessed our text raw data by removing unwanted data, Stop Words
etc. After that we use six classification techniques to classify Guruchandali Dosh sentences. In our
work we use Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Extreme Gradient
Boosting (XGB), Support Vector Machine (SVM), K-nearest neighbors (KNN) algorithms. All
algorithms perform very well on our datasets. Among them Multinomial Naive Bayes (MNB)
algorithm came with highest accuracy which is 85%. When we give input Bangla text data in our
model, MNB model is able to predict the Guruchandali Dosh perfectly.