Abstract:
Banglish, a Bengali-English language, is gaining popularity online. However, navigating
its complexities presents challenges for Natural Language Processing (NLP) methods due
to its linguistic fusion and lack of resources. This research explores the multifaceted
analysis of Banglish text, including tasks like toxicity detection, identity hate prediction,
threat assessment, and insult recognition. Using a dataset of 15,370 Banglish comments
from social media platforms, the study investigates the effectiveness of four machine
learning models: Support Vector Classifiers (SVCs), Random Forests Classifiers (RFCs),
Long Short-Term Memory (LSTM) networks, and Bi-Directional LSTMs. Support Vector
Machines (SVM) outperform other models in sentiment analysis, identifying Banglish text
sentiment with 87% accuracy. This allows businesses and social media platforms to
customize information and services based on this performance. With an 85% accuracy rate,
SVCs are also excellent at anticipating potential toxicity. SVC also predicts insult and hatespeech with 75% and 77% accuracy for promoting safer online conversation. They also
lead the industry with a 73% accuracy rate in identifying potential threat from Banglish
text, ensuring a safer online environment. The study explores the effectiveness of Support
Vector Machines (SVM) in Banglish text classification, highlighting their potential in
handling intricate aspects. However, further research is needed on transfer learning
strategies, domain-specific word embeddings, and ethical issues in code-mixed language
processing. The research also addresses practical issues like danger assessment.