| dc.description.abstract |
Social media's growth has facilitated the quick spread of both positive and negative content, and hate speech is one of the most harmful forms of online expression. The goal of this study is to detect hate speech in Banglish, a code-mixed language commonly used in social media conversations that blends English and Bangla. The research is about to develop a machine learning and deep learning-based system for identifying hate speech in Banglish in order to get beyond the unique challenges caused by the combination of languages and everyday idioms. A collection of Banglish text from several social media platforms was preprocessed using techniques like tokenization, lemmatization, and normalization. The expansion of words such as "pic" to "picture",img to “image” and "u" to "you" The method of lemmatization decreased linguistic variances. We have applied six deep learning models which are LSTM, GRU, Bi-LSTM, Bi-GRU, GRU+LSTM and Bi-LSTM+Bi-GRU. These models' performance was evaluated using confusion matrices, F1-score, accuracy, precision, and recall. The multiclass classification job has an accuracy of 83.93% in “Bi-LSTM+Bi-GRU” model which is a hybrid model in differentiating between seven different classification types. In particular, this study advances automated hate speech detection systems for code-mixed languages like Banglish. The findings indicate that while current models have promise, more study is needed to address problems including data imbalance and the identification of more subtle kinds of hate speech. Future initiatives to improve the precision and robustness of hate speech detection systems across other languages are made possible by this research. |
en_US |