Abstract:
Text vectorization, features extraction and machine learning algorithms play a vital role to the field of sentiment classification. Accuracy of sentiment classification varies depending on various machine learning approaches, vectorization models and features extraction methods. This paper represents multiple ways of evaluations with the necessary steps needed to achieve highest accuracy for classifying the sentiment of reviews. We apply two n-gram vectorization models - Unigram and Bigram individually. Later on, we also apply features extraction method TF-IDF with Unigram and Bigram respectively. Five ensemble machine learning algorithms namely Random Forest (RF), Extra Tree (ET), Bagging Classifier (BC), Ada Boost (ADA) and Gradient Boost (GB) are used here. The key findings in this study is to determine which combination of vectorization models (Bigram, Unigram) along with feature extraction method (TF-IDF) and ensemble classifier gives the better performance of sentiment classification.
Full Text Link: https://doi.org/10.1088/1742-6596/1060/1/012036