| dc.description.abstract |
Rapid mobile communication expansion has fundamentally altered digital connectiveness, turning SMS into a critical form of interaction for individuals, businesses, and institutions. However, this growth has also facilitated an unprecedented upsurge in spam messaging, which comes in the form of fraudulent, phishing, or promotional messages that endanger user security, and privacy. Ergo, an intelligent machine learning -based system automatically detecting and filtering spam is proposed in this research. Two benchmark models, namely Logistic Regression and Multinomial Naive Bayes have been created, relying on TF-IDF vectorization to extract textual features and SMOTE to standardize and balance the dataset.These models displayed consistent and robust results by displaying 96.7% and 94.6% accuracies in classifying spam and ham messages throughout the research. Additionally, with the aim to further enhance detection performance, a novel hybrid ensemble stacking model - SmartSMSGuard was developed, blending predictive abilities of linear and non-linear predictors by merging both models through a meta-classifier. The highest accuracy of the model was recorded as 97.99%, for which precision and recall values were also individually higher than other classifiers, which facilitated overall robustness. As experimental results have indicated, fewer spam messages have been missed by SmartSMSGuard, which surpassed both classifiers in false positive prediction. Therefore, SmartSMSGuard is a consolidated and credible yet scalable system for intelligent SMS spam filtering. Moreover, the model also exhibits high adaptability levels in different sets of data and therefore is viable for dynamic use for spam detection in the mobile network in real-time mode. The lightweight nature of the model requires minimal computation overhead allowing it to run fast and effectively hence can run efficiently even in real-time on large big data systems. The combination of feature engineering and ensemble learning enhances the model’s performance further by increasing its interpretability and scalability simultaneously |
en_US |