Abstract:
The rise of online communication platforms has significantly increased slang use,
posing a challenge for content control and mood analysis. This study aims to detect
Bengali slang using advanced machine learning methods, focusing on creating reliable
models that can recognize and control informal language in digital conversations. The
experimental technique involves data collection, preprocessing, and feature
engineering, with a diversified collection of Bengali literature from various web
sources. Four well-known machine learning models are chosen for assessment: Linear
Support Vector Classifier, Multinomial Naïve Bayes, Random Forest Classifier, and
Logistic Regression. Each model undergoes careful training, including cross-validation
and hyperparameter adjustment, to improve performance. Ethical issues are considered,
with privacy and permission being paramount, and data collection complies with legal
requirements. Model fairness and possible biases are also examined to ensure equal
treatment of various user groups. The results show that the Linear Support Vector
Classifier has excellent recall, the Multinomial Nave Bayes model exhibits great
accuracy, and the Random Forest Classifier demonstrates resilience to class
imbalances. The study has applications in informal communication sentiment analysis,
configurable text filters, and content control in online platforms. The models are useful
tools for developing safer and more specialized digital environments due to their
versatility and scalability. This study enhances machine learning's capacity to recognize
Bengali slang, advancing our knowledge of the linguistic variety present in online
communication.