Abstract:
In the internet age, cyberbullying has grown to be a serious concern, particularly for English-speaking nations. This work focuses on identifying cyberbullying in English with the use of deep learning techniques. A specialized English dataset, comprising instances of both cyberbullying and non-cyberbullying text, is utilized for training a deep learning model. Tokenization, preprocessing, and sequence transformation are applied to the dataset so that it may be fed into Random Forest, Naïve Bayes, and BERT classifiers using LSTM cells. The novel LSTM-based deep learning model was used for the dataset and the dropout and word embedding technique were used to improve the model’s performance. The best model was evaluated with confusion matrix. Research is being done on a number of approaches, including language-specific preprocessing and data augmentation, to address the particular problems with cyberbullying detection in English. The results demonstrate how well deep learning works to identify cyberbullying in English-speaking contexts and show how promising the technology is for addressing this issue. The study reveals that the BERT achieved an accuracy of 87%, demonstrating its superior performance. Additionally, an alternative approach using LSTM yielded the accuracy 84%. Ensemble models, including Naïve Bayes (NB), and Random Forest, were also employed, with hyper- parameter tuning optimizing their performance. Notably, the LSTM and BERT outperformed other models, attaining the highest accuracy rate of 87% in cyberbullying detection, as confirmed by recent experimental inquiries evaluating these findings.