Fake News Classification in Machine Learning with Different Word Representations

Elaf Alhazmi

Research Article

Fake News Classification in Machine Learning with Different Word Representations

by Elaf Alhazmi

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 183 - Issue 37

Published: Nov 2021

Authors: Elaf Alhazmi

10.5120/ijca2021921765

PDF

Elaf Alhazmi . Fake News Classification in Machine Learning with Different Word Representations. International Journal of Computer Applications. 183, 37 (Nov 2021), 1-7. DOI=10.5120/ijca2021921765

                        @article{ 10.5120/ijca2021921765,
                        author  = { Elaf Alhazmi },
                        title   = { Fake News Classification in Machine Learning with Different Word Representations },
                        journal = { International Journal of Computer Applications },
                        year    = { 2021 },
                        volume  = { 183 },
                        number  = { 37 },
                        pages   = { 1-7 },
                        doi     = { 10.5120/ijca2021921765 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2021
                        %A Elaf Alhazmi
                        %T Fake News Classification in Machine Learning with Different Word Representations%T 
                        %J International Journal of Computer Applications
                        %V 183
                        %N 37
                        %P 1-7
                        %R 10.5120/ijca2021921765
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Text classification has been effectively applied in a variety of domains, one of which is the detection of fake news. Working with a classification framework is an important approach for detecting fake news. One of the most significant steps in converting text to numbers in a classification framework is feature extraction. In this paper, we compare the effectiveness of several feature extraction approaches such as bag of words, TF-IDF, and one-hot encoding. For the experiment, we measured the accuracy of the classification and evaluated the best/worst classifier in three techniques using three fake news detection data sets and six machine learning classifiers. Following our tests, we discovered that employing a bag of words, also known as CountVectorizer, and the TF-IDF approach in text classification for selected data outperforms one-hot encoding. Despite the fact that logistic regression and support vector machine both produce valid results by using bag of words and TF-IDF, random forest classifier is the only algorithm that consistently produces accurate results in all three feature extraction methods. The accuracy of support vector machine in one-hot encoding was the lowest even though the algorithm produced substantial results in the other two extraction procedures.

References

Rahul Agrawal, Archit Gupta, Yashoteja Prabhu, and Manik Varma. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the 22nd international conference on World Wide Web, pages 13–24, 2013.
Oluwaseun Ajao, Deepayan Bhowmik, and Shahrzad Zargari. Fake news identification on twitter with hybrid cnn and rnn models. In Proceedings of the 9th international conference on social media and society, pages 226–230, 2018.
Abdulaziz Albahr and Marwan Albahar. An empirical comparisonof fake news detection using different machine learning algorithms. IJACSA, 2020.
Eman Alsagour, Lubna Alhenki, and Mohammed Al-Dhelaan. Different word representation for text classification: A comparative study. In 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), pages 1–2. IEEE, 2019.
Pritika Bahad, Preeti Saxena, and Raj Kamal. Fake news detection using bi-directional lstm-recurrent neural network. Procedia Computer Science, 165:74–82, 2019.
Jason Brownlee. Deep learning for natural language processing: develop deep learning models for your natural language problems. Machine Learning Mastery, 2017.
Mwamba Kasongo Dahouda and Inwhee Joe. A deep-learned embedding technique for categorical features encoding. IEEE Access, 9:114381–114391, 2021.
Ibrahim Eldesoky, Desouky Fattoh, Ali Farid, and Mousa. Fake news detection based on word and document embedding using machine learning classifiers. Journal of Theoretical and Applied Information Technology, 99, 04 2021.
Babacar Gaye, Dezheng Zhang, and Aziguli Wulamu. Improvement of support vector machine algorithm in big data background. Mathematical Problems in Engineering, 2021, 2021.
Serkan Günal, Semih Ergin, M Bilginer Gülmezo˘glu, and Ö Nezih Gerek. On feature extraction for spam e-mail detection. In International Workshop on Multimedia Content Representation, Classification and Security, pages 635–642. Springer, 2006.
Ammar Ismael Kadhim. Survey on supervised machine learning techniques for automatic text classification. Artificial Intelligence Review, 52(1):273–292, 2019.
Sheng How Kong, Li Mei Tan, Keng Hoon Gan, and Nur Hana Samsudin. Fake news detection using deep learning. In 2020 IEEE 10th Symposium on Computer Applications & Industrial Electronics (ISCAIE), pages 102–107. IEEE, 2020.
Isa Maks and Piek Vossen. A lexicon model for deep sentiment analysis and opinion mining applications. Decision Support Systems, 53(4):680–688, 2012.
CD Manning, P Raghavan, and H Schutze. Introduction to information retrieval (vol. 1). cambridge: Cambridge university press. 2008.
Chao-Ying Joanne Peng, Kuk Lida Lee, and Gary M Ingersoll. An introduction to logistic regression analysis and reporting. The journal of educational research, 96(1):3–14, 2002.
Karishnu Poddar, KS Umadevi, et al. Comparison of various machine learning models for accurate detection of fake news. In 2019 Innovations in Power and Advanced Computing Technologies (i-PACT), volume 1, pages 1–5. IEEE, 2019.
Lior Rokach and Oded Maimon. Top-down induction of decision trees classifiers-a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 35(4):476–487, 2005.
Victoria L Rubin, Niall Conroy, Yimin Chen, and Sarah Cornwell. Fake news or truth? using satirical cues to detect potentially misleading news. In Proceedings of the second workshop on computational approaches to deception detection, pages 7–17, 2016.
Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1):22–36, 2017.
Mandeep Singh, Mohammed Wasim Bhatt, Harpreet Singh Bedi, and Umang Mishra. Performance of bernoulli’s naive bayes classifier in the detection of fake news. Materials Today: Proceedings, 2020.
N Smitha and R Bharath. Performance comparison of machine learning classifiers for fake news detection. In 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pages 696–700. IEEE, 2020.
Alper Kursat Uysal and Serkan Gunal. The impact of preprocessing on text classification. Information processing & management, 50(1):104–112, 2014.
Sairamvinay Vijayaraghavan, Ye Wang, Zhiyuan Guo, John Voong, Wenda Xu, Armand Nasseri, Jiaru Cai, Linda Li, Kevin Vuong, and Eshan Wadhwa. Fake news detection with different models. arXiv preprint arXiv:2003.04978, 2020.
William Yang Wang. " liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648, 2017.
Kasra Majbouri Yazdi, Adel Majbouri Yazdi, Saeid Khodayi, Jingyu Hou, Wanlei Zhou, and Saeed Saedy. Improving fake news detection using k-means and support vector machine approaches. International Journal of Electronics and Communication Engineering, 14(2):38–42, 2020.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Fake News Classification Word Representation Machine Learning