Abstract:
This paper presents a comprehensive study on sentiment analysis in the Bengali language,
focusing on user-generated comments from online shopping websites. Despite the
significant number of Bengali speakers worldwide, the language remains underrepresented
in natural language processing (NLP) research. This study aims to bridge this gap by
applying advanced sentiment analysis techniques to better understand customer opinions
and preferences in the e-commerce domain. The research involved a meticulous process of
data collection, where 1995 comments were extracted using web scraping techniques.
Rigorous preprocessing methods, including text cleaning, normalization, tokenization, and
Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, were employed to
prepare the dataset for analysis. A variety of machine learning (ML) models, such as
Logistic Regression, Decision Trees, Random Forest, Multi. Naive Bayes, KNN, SVM,
and SGD, along with deep learning (DL) models like LSTM, Bi-LSTM, and CNN, were
trained and evaluated on this dataset. The results revealed that while traditional ML models
like SVM and SGD showed strong performance, deep learning models, particularly BiLSTM, demonstrated superior ability in sentiment classification. This was attributed to
their effectiveness in capturing contextual nuances and complex linguistic patterns inherent
in Bengali. The study underscored the challenges of processing a morphologically rich
language like Bengali and the importance of choosing the right model for effective
sentiment analysis. Furthermore, the research addressed the societal, ethical, and
environmental implications of implementing sentiment analysis tools. It highlighted the
need for responsible data usage, bias mitigation, transparency in model application, and
sustainable computing practices. In conclusion, the research contributes significantly to the
field of sentiment analysis in less-studied languages, providing valuable insights for
businesses, policymakers, and researchers. It paves the way for more inclusive
technological advancements in AI and NLP, ensuring linguistic diversity is embraced and
respected in the digital age.