Abstract:
As hate speech is becoming more prevalent on social media platforms and has a negative impact on both individuals and groups, it is crucial to identify and mitigate hate speech in online environments. To solve this issue, the study suggests a method for automatically categorising tweets into three groups: Hate, Offensive, and Neither. We perform an extensive process of data collecting, preprocessing, and augmentation using a publicly available tweet dataset. Social media is used to collect initial datasets, which are then carefully cleaned to eliminate noise and unnecessary information. Synthetic data generation is usedto balance the dataset to overcome the prevalent problem of class imbalance in hate speech detection placement. To generate an innovative deep learning model that will improve the high accuracy detection and classification of hate speech and abusive language. We conduct experiments to construct LSTM and Bi-LSTM models along with a transfer learning strategy based on pre-trained language models, DistilBERT and BERT. While the BERT model performs better at capturing contextual information, we focus special attention to it. The performance of the models is greatly enhanced by the addition of synthetic hate speech data. After assessment the model on test data, we achieve an accuracy of over 91 percent. Additionally, this leads to better performance in the hate speech class. use of BERT's bidirectional training approach, which improves the capacity for local contextual understanding and classification. The work also advances the area by investigating advanced techniques for producing synthetic data, which results in a more evenly group instruction dataset. This method not only increases the generalizability of the model but also provides the way to more effective management of sensitive and contextually complicated hate speech. Social media platforms and brands could more effectively regulate and lessen the negative effects of toxic content by utilising the study's strong and scalable hate speech detection technology. By shielding users from offensive words, this method fosters a more secure and friendly online community. Furthermore, the study provides a standard for subsequent research, guiding the creation of deep learning models that are more effective at detecting hate speech in a variety of languages and cultural contexts.