Abstract:
Social spam has rapidly increased over recent years. Facebook and YouTube contain the most spam content compared with other social media networks. This kind of spam contents like text messaging or comments has a gigantic negative effect on normal user’s experience in social media. In this project, I used Naïve Bayes classifier, a supervised machine (SVM) learning algorithm to detect Bangla spam text content. Many spam detection works have been done on English. But I have worked on Bangla language which is used by the most Bangladeshi users. My analysis first collects Bangla text data from YOUTUBE, FACEBOOK and other social media. Then I applied a number of classifiers like Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes etc. At the end, I verified and compared the detectability of Bangla spam text content through different experiment and evaluation. Experiments showed that the Multinomial Naïve Bayes (MNB) algorithm had the best accuracy compared to other machine learning algorithms and my research showed 81.44% accuracy in detecting spam text content from Bangla language.