Advancing sentiment analysis in Bengali: bridging linguistic gaps in NPL with machine and deep learning models

Al Masud, Abdullah

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

Advancing sentiment analysis in Bengali: bridging linguistic gaps in NPL with machine and deep learning models

Al Masud, Abdullah

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/12989

Date: 2024-01-01

Abstract:

This paper presents a comprehensive study on sentiment analysis in the Bengali language, focusing on user-generated comments from online shopping websites. Despite the significant number of Bengali speakers worldwide, the language remains underrepresented in natural language processing (NLP) research. This study aims to bridge this gap by applying advanced sentiment analysis techniques to better understand customer opinions and preferences in the e-commerce domain. The research involved a meticulous process of data collection, where 1995 comments were extracted using web scraping techniques. Rigorous preprocessing methods, including text cleaning, normalization, tokenization, and Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, were employed to prepare the dataset for analysis. A variety of machine learning (ML) models, such as Logistic Regression, Decision Trees, Random Forest, Multi. Naive Bayes, KNN, SVM, and SGD, along with deep learning (DL) models like LSTM, Bi-LSTM, and CNN, were trained and evaluated on this dataset. The results revealed that while traditional ML models like SVM and SGD showed strong performance, deep learning models, particularly BiLSTM, demonstrated superior ability in sentiment classification. This was attributed to their effectiveness in capturing contextual nuances and complex linguistic patterns inherent in Bengali. The study underscored the challenges of processing a morphologically rich language like Bengali and the importance of choosing the right model for effective sentiment analysis. Furthermore, the research addressed the societal, ethical, and environmental implications of implementing sentiment analysis tools. It highlighted the need for responsible data usage, bias mitigation, transparency in model application, and sustainable computing practices. In conclusion, the research contributes significantly to the field of sentiment analysis in less-studied languages, providing valuable insights for businesses, policymakers, and researchers. It paves the way for more inclusive technological advancements in AI and NLP, ensuring linguistic diversity is embraced and respected in the digital age

Show full item record