Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam

Dipalee B. Borse; Swati K. Borse; Vijaya Ahire

Research Article

Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam

by Dipalee B. Borse, Swati K. Borse, Vijaya Ahire

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 185 - Issue 10

Published: May 2023

Authors: Dipalee B. Borse, Swati K. Borse, Vijaya Ahire

10.5120/ijca2023922766

PDF

Dipalee B. Borse, Swati K. Borse, Vijaya Ahire . Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam. International Journal of Computer Applications. 185, 10 (May 2023), 12-17. DOI=10.5120/ijca2023922766

                        @article{ 10.5120/ijca2023922766,
                        author  = { Dipalee B. Borse,Swati K. Borse,Vijaya Ahire },
                        title   = { Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam },
                        journal = { International Journal of Computer Applications },
                        year    = { 2023 },
                        volume  = { 185 },
                        number  = { 10 },
                        pages   = { 12-17 },
                        doi     = { 10.5120/ijca2023922766 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2023
                        %A Dipalee B. Borse
                        %A Swati K. Borse
                        %A Vijaya Ahire
                        %T Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam%T 
                        %J International Journal of Computer Applications
                        %V 185
                        %N 10
                        %P 12-17
                        %R 10.5120/ijca2023922766
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

The usage of social networking sites is rising rapidly every day. The popularity of twitter as a microblogging site is huge in normal users as well as illegitimate users. The people with wrong intentions use twitter to spread spam posts which results in phishing, monetary loss, un-useful or noisy data on social media, stealing personal information etc. It becomes extremely important to stop spamming activities. In this paper six machine learning classifiers, which are Logistic regression & Support Vector machine (linear models) and Random forest, K- Nearest Neighbor, Decision tree and Naive Bayes, (nonlinear models), have been implemented on existing data and compared the performance using different parameters such as accuracy, F1-score, recall, precision, f-measure. Among the six classifiers random forest has shown better accuracy followed by K-nearest neighbor classifier for large continuous dataset than small or random dataset. The accuracy is increased from 3% to 13% for large continuous data. Also False positive ratio of random forest and K-nearest neighbor algorithm 0.001 and 0.005 respectively which is much lesser than other algorithms. With lowest accuracy and highest FPR Naive Bayes algorithm performed worst for large datasets.

References

Chen, Chao, Jun Zhang, Xiao Chen, Yang Xiang, and Wanlei Zhou. "6 million spam tweets: A large ground truth for timely Twitter spam detection." In 2015 IEEE international conference on communications (ICC), pp. 7065-7070. IEEE, 2015.
Sun, Nan, Guanjun Lin, Junyang Qiu, and Paul Rimba. "Near real-time twitter spam detection with machine learning techniques." International Journal of Computers and Applications 44, no. 4 (2022): 338-348
Lin, Guanjun, et al. "Statistical twitter spam detection demystified: performance, stability and scalability." IEEE access 5 (2017): 11142-11154.
Chen, Chao, Jun Zhang, Yi Xie, Yang Xiang, Wanlei Zhou, Mohammad Mehedi Hassan, AbdulhameedAlElaiwi, and MajedAlrubaian. "A performance evaluation of machine learning-based streaming spam tweets detection." IEEE Transactions on Computational Social systems 2, no. 3 (2015): 65-76.
Borse, D., Borse, S. (2022). State of the Art on Twitter Spam Detection. In: Iyer, B., Crick, T., Peng, SL. (eds) Applied Computational Technologies. ICCET 2022. Smart Innovation, Systems and Technologies, vol303. Springer, Singapore. https://doi.org/10.1007/978-981-19-2719-5_46
Abu-Salih, Bilal, Dana Al Qudah, Malak Al-Hassan, Seyed Mohssen Ghafari, Tomayess Issa, Ibrahim Aljarah, Amin Beheshti, and Sulaiman Alqahtan. "An Intelligent System for Multi-Topic Social Spam Detection in Microblogging." arXiv preprint arXiv:2201.05203 (2022).
Wu, Tingmin, Shigang Liu, Jun Zhang, and Yang Xiang. "Twitter spam detection based on deep learning." In Proceedings of the australasian computer science week multiconference, pp. 1-8. 2017.
Rodrigues, Anisha P., Roshan Fernandes, Adarsh Shetty, Kuruva Lakshmanna, and R. Mahammad Shafi. "Real-time twitter spam detection and sentiment analysis using machine learning and deep learning techniques." Computational Intelligence and Neuroscience 2022 (2022).
Zhu, Tiantian, Hongyu Gao, Yi Yang, Kai Bu, Yan Chen, Doug Downey, Kathy Lee, and Alok N. Choudhary. "Beating the artificial chaos: Fighting OSN spam using its own templates." IEEE/ACM Transactions on Networking 24, no. 6 (2016): 3856-3869.
Wang, Xuesong, Qi Kang, Jing An, and Mengchu Zhou. "Drifted Twitter spam classification using multiscale detection test on KL divergence." IEEE Access 7 (2019): 108384-108394.
Tajalizadeh, Hadi, and Reza Boostani. "A novel stream clustering framework for spam detection in Twitter." IEEE Transactions on Computational Social Systems 6, no. 3 (2019): 525-534.
Jain, Gauri, Manisha Sharma, and Basant Agarwal. "Spam detection in social media using convolutional and long short term memory neural network." Annals of Mathematics and Artificial Intelligence 85.1 (2019): 21-44.
El-Mawass, Nour, Paul Honeine, and Laurent Vercouter. "SimilCatch: Enhanced social spammers detection on twitter using Markov random fields." Information Processing & Management 57, no. 6 (2020): 102317.
Tang, Wenbing, Zuohua Ding, and Mengchu Zhou. "A spammer identification method for class imbalanced weibo datasets." IEEE Access 7 (2019): 29193-29201.
https://vkosuri.github.io/CourseraMachineLearning/
https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a4 44fca47.
Ameen, Aso Khaleel, and Buket Kaya. "Spam detection in online social networks by deep learning." 2018 International Conference on Artificial Intelligence and Data Processing (IDAP). IEEE, 2018.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Spam detection Machine learning twitter spam detection information security.