International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 185 - Issue 10 |
Published: May 2023 |
Authors: Dipalee B. Borse, Swati K. Borse, Vijaya Ahire |
![]() |
Dipalee B. Borse, Swati K. Borse, Vijaya Ahire . Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam. International Journal of Computer Applications. 185, 10 (May 2023), 12-17. DOI=10.5120/ijca2023922766
@article{ 10.5120/ijca2023922766, author = { Dipalee B. Borse,Swati K. Borse,Vijaya Ahire }, title = { Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam }, journal = { International Journal of Computer Applications }, year = { 2023 }, volume = { 185 }, number = { 10 }, pages = { 12-17 }, doi = { 10.5120/ijca2023922766 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2023 %A Dipalee B. Borse %A Swati K. Borse %A Vijaya Ahire %T Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam%T %J International Journal of Computer Applications %V 185 %N 10 %P 12-17 %R 10.5120/ijca2023922766 %I Foundation of Computer Science (FCS), NY, USA
The usage of social networking sites is rising rapidly every day. The popularity of twitter as a microblogging site is huge in normal users as well as illegitimate users. The people with wrong intentions use twitter to spread spam posts which results in phishing, monetary loss, un-useful or noisy data on social media, stealing personal information etc. It becomes extremely important to stop spamming activities. In this paper six machine learning classifiers, which are Logistic regression & Support Vector machine (linear models) and Random forest, K- Nearest Neighbor, Decision tree and Naive Bayes, (nonlinear models), have been implemented on existing data and compared the performance using different parameters such as accuracy, F1-score, recall, precision, f-measure. Among the six classifiers random forest has shown better accuracy followed by K-nearest neighbor classifier for large continuous dataset than small or random dataset. The accuracy is increased from 3% to 13% for large continuous data. Also False positive ratio of random forest and K-nearest neighbor algorithm 0.001 and 0.005 respectively which is much lesser than other algorithms. With lowest accuracy and highest FPR Naive Bayes algorithm performed worst for large datasets.