Abstract:
This study aims to enhance the accuracy of SMS spam detection, addressing the rising
threat of unsolicited and potentially harmful messages in mobile communication. By
leveraging both machine learning and deep learning techniques, we seek to identify the
most effective model for distinguishing between spam and legitimate SMS messages. It
uses a dataset consist of total 5572 number of data among 4825 non-spam and 747 was
spam message. The dataset used consists of labeled SMS messages, which underwent
preprocessing steps including cleaning, tokenization, and lowercasing. We implemented
and evaluated several models: Logistic Regression, Multinomial Naive Bayes, Simple
RNN, LSTM, and GRU. Among these, Logistic Regression outperformed all other
models, achieving the highest accuracy. For machine learning, Multinomial Naive Bayes
and Logistic Regression. Multinomial Naive Bayes has been trained in two ways. TF-IDF
and CV (Count Vectorization). Among them, CV based MultinomialNB performed better
than TF-IDF based MultinomialNB which is 97.68% compared to 95.85%. In deep
learning, we choose RNN based models such as LSTM (Long Short Term Memory),
GRU (Gated Recurrent Unit) and SimpleRNN in order to achieve better accuracy. These
deep learning models achieve accuracy of 97.78%, 97.78% and 97.97% respectively.
Later Logistic Regression, another machine learning involves in this study outperformed
all the models with an accuracy of 99.84% with precision of 99.69%. The findings
highlight the robustness and efficiency of traditional ML models in handling the SMS
spam detection task, while also providing insights into the performance of advanced DL
models. This research contributes to the development of reliable and scalable spam
detection solutions, enhancing user safety and the overall security of mobile
communication systems.