Sentiment Analysis on Bengali Comments of YouTube’s Bangla Drama to Predict Emotions: A TF-IDF Approach

Sarkar, A. B. M. Kibria

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF SOFTWARE ENGINEERING
→
Thesis Report
→
View Item

Sentiment Analysis on Bengali Comments of YouTube’s Bangla Drama to Predict Emotions: A TF-IDF Approach

Sarkar, A. B. M. Kibria

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/14145

Date: 2024-05-14

Abstract:

Sentiment Analysis (SA) is indeed a rapidly growing field within computer science, particularly in the realm of natural language processing (NLP). It involves the automated analysis of text to determine the underlying sentiment or emotion expressed within it. This capability holds significant importance in various applications such as social media monitoring, customer feedback analysis, market research, and more. In my specific study, I have focused on sentiment analysis in the context of the Bengali language, which is a valuable contribution considering the relatively limited research in this area. By examining emotions such as happiness and anger, I am addressing fundamental aspects of human expression within the linguistic context of Bengali. My methodology involves employing different machine learning techniques to train a dataset for sentiment analysis. Let's delve into the techniques I have utilized and the corresponding accuracies: Logistic Regression (LR): This is a statistical method used for modeling binary outcomes, making it suitable for sentiment analysis tasks where the goal is to classify text into positive or negative sentiments. My LR model achieved an accuracy of 79.80%, indicating its effectiveness in capturing the nuances of sentiment in Bengali text. Decision Tree (DT): Decision trees are a popular machine learning algorithm for classification tasks. They partition the feature space into smaller regions based on certain criteria, making them interpretable and easy to visualize. My DT model achieved an accuracy of 78.44%, demonstrating its capability in discerning sentiment patterns in Bengali text. Random Forest (RF): Random Forest is an ensemble learning technique that combines multiple decision trees to improve predictive performance and reduce overfitting. My RF model achieved a similar accuracy to LR, further validating its effectiveness in sentiment analysis tasks. Bernoulli Naive Bayes (BNB): Naive Bayes classifiers are based on Bayes' theorem and assume independence between features. Bernoulli Naive Bayes specifically works well with binary features, making it suitable for sentiment analysis where the presence or absence of certain words may indicate sentiment. My BNB model achieved an accuracy of 79.73%, demonstrating its competitiveness with other techniques. K-Nearest Neighbors (KNN): Classifying instances according to the majority class among their K nearest neighbors (KNN) is the basis of the straightforward and user-friendly KNN classification technique. While my KNN model achieveda lower accuracy of 68.02%, it still provides valuable insights into sentiment patterns in Bengali text. Support Vector Classifier (SVC): SVC is a powerful classification algorithm that works by finding the hyperplane that best separates different classes in the feature space. My SVC model outperformed the other techniques with an accuracy of 81.48%, indicating its effectiveness in capturing complex sentiment patterns in Bengali text.