Optimizing SMS Spam Detection Using Machine Learning: A Comparative Analysis of Ensemble and Traditional Classifiers
DOI:
10.47709/cnahpc.v6i4.4822Keywords:
SMS spam detection, Machine learning, Ensemble classifiers, Support Vector Machine, Spam classificationDimension Badge Record
Abstract
With the rapid rise of mobile communication, Short Message Service (SMS) has become an essential platform for transmitting information. However, the growing volume of unsolicited and harmful spam messages presents significant challenges for both users and mobile network operators. This study explores the effectiveness of various machine learning models, including Random Forest, Gradient Boosting, AdaBoost, Support Vector Machine (SVM), Logistic Regression, and an Ensemble Voting Classifier, in detecting SMS spam. A dataset containing 5,572 SMS messages, labeled as either spam or ham (legitimate), was used to evaluate these models. Hyperparameter tuning was performed on each model to optimize accuracy, and the models were assessed using metrics such as precision, recall, F1-score, and accuracy. The results indicated that the SVM and Ensemble Voting Classifier achieved the highest performance, with accuracies of 0.9857 and 0.9848, respectively. Both models demonstrated superior recall for spam messages, making them highly effective for real-world spam detection systems. While Random Forest, Gradient Boosting, and AdaBoost also performed well, their slightly lower recall for spam suggests that they may misclassify some spam as legitimate messages. The study highlights the effectiveness of machine learning models in addressing the SMS spam problem, particularly when using ensemble methods. Future research should focus on addressing class imbalance and exploring deep learning approaches to further enhance model performance. These findings offer valuable insights for developing more accurate and scalable SMS spam detection systems.
Downloads
Abstract viewed = 35 times
References
Abid, M. A., Ullah, S., Siddique, M. A., Mushtaq, M. F., Aljedaani, W., & Rustam, F. (2022). Spam SMS filtering based on text features and supervised machine learning techniques. Multimedia Tools and Applications, 81(28), 39853–39871.
Abu-Salih, B., Qudah, D. Al, Al-Hassan, M., Ghafari, S. M., Issa, T., Aljarah, I., … Alqahtani, S. (2022). An intelligent system for multi-topic social spam detection in microblogging. Journal of Information Science, 01655515221124062.
Afifi, S., GholamHosseini, H., & Sinha, R. (2020). FPGA implementations of SVM classifiers: A review. SN Computer Science, 1(3), 133.
Alam, T. M., Shaukat, K., Khan, W. A., Hameed, I. A., Almuqren, L. A., Raza, M. A., … Luo, S. (2022). An efficient deep learning-based skin cancer classifier for an imbalanced dataset. Diagnostics, 12(9), 2115.
Alkhalil, Z., Hewage, C., Nawaf, L., & Khan, I. (2021). Phishing attacks: A recent comprehensive study and a new anatomy. Frontiers in Computer Science, 3, 563060.
Awe, O. O., Opateye, G. O., Johnson, C. A. G., Tayo, O. T., & Dias, R. (2024). Weighted hard and soft voting ensemble machine learning classifiers: Application to anaemia diagnosis. In Sustainable Statistical and Data Science Methods and Practices: Reports from LISA 2020 Global Network, Ghana, 2022 (pp. 351–374). Springer.
Bose, S. (2023). Deep One-Class Learning for Anomalous Short-text Classification.
Božani?, M., & Sinha, S. (2021). Mobile communication networks: 5G and a vision of 6G. Springer.
Carmona, P., Dwekat, A., & Mardawi, Z. (2022). No more black boxes! Explaining the predictions of a machine learning XGBoost classifier algorithm in business failure. Research in International Business and Finance, 61, 101649.
Choi, J., Jeon, B., & Jeon, C. (2024). Scalable Learning Framework for Detecting New Types of Twitter Spam with Misuse and Anomaly Detection. Sensors, 24(7), 2263.
Daisy, S. J. S., & Begum, A. R. (2021). Smart material to build mail spam filtering technique using Naive Bayes and MRF methodologies. Materials Today: Proceedings, 47, 446–452.
de Zarzà, I., de Curtò, J., & Calafate, C. T. (2023). Optimizing Neural Networks for Imbalanced Data. Electronics, 12(12), 2674.
Dhieb, N., Ghazzai, H., Besbes, H., & Massoud, Y. (2020). A secure ai-driven architecture for automated insurance systems: Fraud detection and risk measurement. IEEE Access, 8, 58546–58558.
Fayaz, M., Khan, A., Rahman, J. U., Alharbi, A., Uddin, M. I., & Alouffi, B. (2020). Ensemble machine learning model for classification of spam product reviews. Complexity, 2020(1), 8857570.
Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M., & Suganthan, P. N. (2022). Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115, 105151.
Gaye, B., Zhang, D., & Wulamu, A. (2021). Improvement of support vector machine algorithm in big data background. Mathematical Problems in Engineering, 2021(1), 5594899.
Genuer, R., Poggi, J.-M., Genuer, R., & Poggi, J.-M. (2020). Random forests. Springer.
Islam, M. K., Al Amin, M., Islam, M. R., Mahbub, M. N. I., Showrov, M. I. H., & Kaushal, C. (2021). Spam-detection with comparative analysis and spamming words extractions. 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), 1–9.
Jáñez-Martino, F., Alaiz-Rodr’iguez, R., González-Castro, V., Fidalgo, E., & Alegre, E. (2023). A review of spam email detection: analysis of spammer strategies and the dataset shift problem. Artificial Intelligence Review, 56(2), 1145–1173.
Kulkarni, A., Balachandran, V., & Das, T. (2024). Phishing Webpage Detection: Unveiling the Threat Landscape and Investigating Detection Techniques. IEEE Communications Surveys & Tutorials.
Le Jeune, L., Goedeme, T., & Mentens, N. (2021). Machine learning for misuse-based network intrusion detection: overview, unified evaluation and feature choice comparison framework. Ieee Access, 9, 63995–64015.
Ling, R., Fortunati, L., Goggin, G., Lim, S. S., & Li, Y. (2020). The Oxford handbook of mobile communication and society. Oxford University Press.
Maqsood, U., Ur Rehman, S., Ali, T., Mahmood, K., Alsaedi, T., & Kundi, M. (2023). An Intelligent Framework Based on Deep Learning for SMS and e-mail Spam Detection. Applied Computational Intelligence and Soft Computing, 2023(1), 6648970.
Maurya, S. K., Singh, D., & Maurya, A. K. (2023). Deceptive opinion spam detection approaches: a literature survey. Applied Intelligence, 53(2), 2189–2234.
Mienye, I. D., & Sun, Y. (2022). A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129–99149.
Mushtaq, Z., Ramzan, M. F., Ali, S., Baseer, S., Samad, A., & Husnain, M. (2022). Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques. Mobile Information Systems, 2022(1), 6521532.
Noekhah, S., binti Salim, N., & Zakaria, N. H. (2020). Opinion spam detection: Using multi-iterative graph-based model. Information Processing & Management, 57(1), 102140.
Patil, L., Sakhidas, J., Jain, D., Darji, S., & Borhade, K. (2022). A Comparative Study of Spam SMS Detection Techniques for English Content Using Supervised Machine Learning Algorithms. International Symposium on Intelligent Informatics, 211–224.
Prosise, J. (2022). Applied machine learning and AI for engineers. “ O’Reilly Media, Inc.”
Rao, S., Verma, A. K., & Bhatia, T. (2021). A review on social spam detection: Challenges, open issues, and future directions. Expert Systems with Applications, 186, 115742.
Rida, J. F. A. (2021). Overview of Development performance for Mobile Phone Wireless Communication Networks. 2021 International Conference on Electrical, Computer and Energy Technologies (ICECET), 1–11.
Roy, P. K., Singh, J. P., & Banerjee, S. (2020). Deep learning to filter SMS Spam. Future Generation Computer Systems, 102, 524–533.
Saidani, N., Adi, K., & Allili, M. S. (2020). A semantic-based classification approach for an enhanced spam detection. Computers & Security, 94, 101716.
Shaaban, M. A., Hassan, Y. F., & Guirguis, S. K. (2022). Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text. Complex & Intelligent Systems, 8(6), 4897–4909.
Sharaff, A., Kamal, C., Porwal, S., Bhatia, S., Kaur, K., & Hassan, M. M. (2021). Spam message detection using Danger theory and Krill herd optimization. Computer Networks, 199, 108453.
Swarnkar, M., Sharma, N., & Kumar Thakkar, H. (2022). Malicious URL detection using machine learning. In Predictive Data Security using AI: Insights and Issues of Blockchain, IoT, and DevOps (pp. 199–216). Springer.
Tusher, E. H., Ismail, M. A., Rahman, M. A., Alenezi, A. H., & Uddin, M. (2024). Email Spam: A Comprehensive Review of Optimize Detection Methods, Challenges, and Open Research Problems. IEEE Access.
Weichbroth Pawe?and ?ysik, ?. (2020). Mobile security: Threats and best practices. Mobile Information Systems, 2020(1), 8828078.
Zhang, H., Quost, B., & Masson, M.-H. (2023). Cautious weighted random forests. Expert Systems with Applications, 213, 118883.
Zhou, X., Lu, P., Zheng, Z., Tolliver, D., & Keramati, A. (2020). Accident prediction accuracy assessment for highway-rail grade crossings using random forest algorithm compared with decision tree. Reliability Engineering & System Safety, 200, 106931.
Downloads
ARTICLE Published HISTORY
How to Cite
Issue
Section
License
Copyright (c) 2024 Gregorius Airlangga
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.