Deep Learning-Based Sentiment Detection in Code-Mixed Language: Exploring RNNs and Transformers  Architectures

Akter, Kazi Ayesha; Bristy, Prathona Rani

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

dc.contributor.author	Akter, Kazi Ayesha
dc.contributor.author	Bristy, Prathona Rani
dc.date.accessioned	2026-04-05T04:30:56Z
dc.date.available	2026-04-05T04:30:56Z
dc.date.issued	2025-09-17
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16574
dc.description	Project Report	en_US
dc.description.abstract	Banglish, the informal hybrid of Bengali and English typed in Latin alphabet, presents its own challenges along with nonstandard spelling and transliteration drift in the form of code-switching. This study aims to create a full pipeline to deal with Banglish from data acquisition and annotation to modeling and analysis over 7 categories (Appearance, Not Hate, Others, Racial, Religious, Sexual, and Slang). In the study, we create and clean a social media corpus, design a preprocessing suite [custom stop word filtering, regex tokenization, and rule-based normalization of spelling variants] tailored for Banglish, and address class imbalance via staged over- and under- sampling to a balanced set of 2,000 instances per class. To understand the model’s performance, we test recurrent architectures (LSTM, GRU, BiLSTM, BiGRU) and their hybrids (LSTM+GRU, BiLSTM+BiGRU) against transformer models (mBERT, XLM-RoBERTa) under equal training conditions. The mBERT model shows the best performance (accuracy 0.88, macro-F1 0.87), followed by BiLSTM+BiGRU among RNN models (accuracy 0.84, macro-F1 0.84), whereas XLM-RoBERTa performs (accuracy 0.75, macro-F1 0.74) the worst, which implies that transformers outperform other models for this task. A confusion-matrix analysis reveals that RNNs consistently fail by collapsing ambiguous classes (Not Hate, Others, Sexual) into Appearance. This failure is substantially reduced by mBERT. We conclude that, with Banglish-specific preprocessing and balanced evaluation, multilingual transformers provide the most reliable basis for moderning Banglish content, while under tighter	en_US
dc.description.sponsorship	Daffodil International University	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Natural Language Processing (NLP)	en_US
dc.subject	Banglish Text Classification	en_US
dc.subject	Code-Switching Detection	en_US
dc.subject	mBERT	en_US
dc.subject	Multilingual Transformers	en_US
dc.title	Deep Learning-Based Sentiment Detection in Code-Mixed Language: Exploring RNNs and Transformers Architectures	en_US
dc.type	Other	en_US