dc.description.abstract |
Online harassment such as trolling, threatening, and bullying becomes a serious issue
regarding our country's present situation. Millions of people from this country are
connected through the internet. Regarding some major social platforms like Facebook,
Youtube and tiktlak, abusive content creates enormous public issues including fear.
Numerous research has been conducted considering this issues in different languages.
however, there were very few research has published on abusive Bangla text detection. In
our experiment, we tried to extract abusive comments from Bangla text using several
classification algorithms such as (Logistic Regression, Multinomial Naive
Bayes,Decision Tree, Support vector Machine ,Random Forest,KNN). The data, we used
in this research, was collected from different online social media, forum, Bangla slang
books and direct speech. The data set consist 2999 sentences in which 1710 sentences are
abusive and the rest of the data are non abusive. In the pre processing part, we have
categorized the data into two polarities abusive noted as 1 and non abusive as 0. The data
was labeled manually. All the special characters and symbols was removed from the raw
data. TF-IDF was used to extract feature from the data. After applying all the
algorithms,the experiment has shown that SVM and LR both performed well. they both
attained 84% accuracy in which SVM achieved 0.84 precision along with 0.91 recall. |
en_US |