Abstract:
The potential of social media is growing as more people use it every day. However, as more people use social media, bullying is also becoming more common in the comment sections of posts by well-known users and viral contents. This number of bullying texts is also increasing and it should be removed before it is displayed. In this analysis, we identify cyberbullying in Bangla texts using some natural language processing (NLP) methods and Machine learning classifier algorithms. We manually created our dataset by collecting the pure Bangla text comments from popular social media platforms like Facebook and YouTube. We get 3524 data, of which 22.1% are about bullying statements and 77.9% are not. We split the data into train dataset and test dataset groups after preprocessing them for the classifier model where train dataset contains 70% of total data and test dataset contain 30% data. Although we implement a number of algorithms, where Multinomial Naïve-Bayes (MNB) had a high accuracy rate of 78.99%, while Decision Tree Classifier had a low accuracy rate of 69.48%. On the other hand, Neighbors Classifier required the least amount of time, 0.0018 seconds while Random Forest Classifier required the most, 1.44 seconds.