Abstract:
Hate speech that is spread online often targets individuals on the basis of many parts of
their identity, such as their race, ethnicity, gender, sexual orientation, religion,
nationality, disability, and other characteristics. These kinds of messages are often
disseminated in Bangladesh via the use of Facebook and YouTube, which are two of the
most popular social media sites in the nation. One significant issue is the promotion of
hate within the celebrity comment section. In Bangladesh, there has been an increase in
suicide attempts and violent incidents motivated by religious beliefs over the past few
years. We now need to filter out comments and opinions like these from social media to
maintain a pleasant atmosphere. I've been concentrating mainly on researching instances
of hate speech in Bangla. In the past, there had been a few efforts made, but they had not
been successful in fulfilling the expectations. The dataset that was used is enormous and
includes more than 3,000 comments that were selected from different social media
platforms. My contribution was to create a model that classified Bangla comments as
"hate speech" or "normal speech" using hybrid machine learning approaches that
combine two traditional models, like K-Nearest Neighbour (KNN) algorithms and
Random Forest (RF), Nave Bayes and Decision Tree, Random Forest and Logistic
Regression, Random Forest and SVM algorithms. This is referred to as the ensemble
method. By using meticulous calculation, our process produces the most dependable
result in Bangla. I compare how well each approach works and choose the model that
does the best on our test data in terms of accuracy.