Abstract:
The steady increase of the social networking sites resulted in the generation of large amounts of the user-generated text data that enable the use of sentiment analysis to gain insights into the general public sentiment. This project mainly concerns the construction of a sentiment classifier on text data from social media through the usage of machine learning and natural language processing. The process flow entails a heavy data pre-processing whereby the text is normalized, tokenized, de-stop worded, and lemmatized. For subjectivity and polarity scores, TextBlob is used to sort out informative comments based on their positive, negative or neutral sentiment. Finally, the feature extraction was done on the text data using Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer to transform the text features into numbers. The multiple machine classifiers under consideration include Naive Bayes, Support Vector Machine (SVM) and Decision Tree. The highest accuracy model is compiled into an API using it for sentiment analysis on new entries made by users. The presented model shows high quality in terms of sentiment prediction therefore, the future work should concentrate on the integration of conventions NLP with machine learning algorithms. This project provides a solution of managing the huge volume of data collected from the SNS and analyzing the user sentiments for businesses and researchers. Potential future work is as follows One could use deep learning models to implement the process and where there is multilingual data the approach may have to be expanded further.