Bengali slang language detection using machine learning

Rimi, Umme Arifa

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

Bengali slang language detection using machine learning

Rimi, Umme Arifa

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/14757

Date: 2024-07-13

Abstract:

The rise of online communication platforms has significantly increased slang use, posing a challenge for content control and mood analysis. This study aims to detect Bengali slang using advanced machine learning methods, focusing on creating reliable models that can recognize and control informal language in digital conversations. The experimental technique involves data collection, preprocessing, and feature engineering, with a diversified collection of Bengali literature from various web sources. Four well-known machine learning models are chosen for assessment: Linear Support Vector Classifier, Multinomial Naïve Bayes, Random Forest Classifier, and Logistic Regression. Each model undergoes careful training, including cross-validation and hyperparameter adjustment, to improve performance. Ethical issues are considered, with privacy and permission being paramount, and data collection complies with legal requirements. Model fairness and possible biases are also examined to ensure equal treatment of various user groups. The results show that the Linear Support Vector Classifier has excellent recall, the Multinomial Nave Bayes model exhibits great accuracy, and the Random Forest Classifier demonstrates resilience to class imbalances. The study has applications in informal communication sentiment analysis, configurable text filters, and content control in online platforms. The models are useful tools for developing safer and more specialized digital environments due to their versatility and scalability. This study enhances machine learning's capacity to recognize Bengali slang, advancing our knowledge of the linguistic variety present in online communication.