International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 183 - Issue 37 |
Published: Nov 2021 |
Authors: Elaf Alhazmi |
![]() |
Elaf Alhazmi . Fake News Classification in Machine Learning with Different Word Representations. International Journal of Computer Applications. 183, 37 (Nov 2021), 1-7. DOI=10.5120/ijca2021921765
@article{ 10.5120/ijca2021921765, author = { Elaf Alhazmi }, title = { Fake News Classification in Machine Learning with Different Word Representations }, journal = { International Journal of Computer Applications }, year = { 2021 }, volume = { 183 }, number = { 37 }, pages = { 1-7 }, doi = { 10.5120/ijca2021921765 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2021 %A Elaf Alhazmi %T Fake News Classification in Machine Learning with Different Word Representations%T %J International Journal of Computer Applications %V 183 %N 37 %P 1-7 %R 10.5120/ijca2021921765 %I Foundation of Computer Science (FCS), NY, USA
Text classification has been effectively applied in a variety of domains, one of which is the detection of fake news. Working with a classification framework is an important approach for detecting fake news. One of the most significant steps in converting text to numbers in a classification framework is feature extraction. In this paper, we compare the effectiveness of several feature extraction approaches such as bag of words, TF-IDF, and one-hot encoding. For the experiment, we measured the accuracy of the classification and evaluated the best/worst classifier in three techniques using three fake news detection data sets and six machine learning classifiers. Following our tests, we discovered that employing a bag of words, also known as CountVectorizer, and the TF-IDF approach in text classification for selected data outperforms one-hot encoding. Despite the fact that logistic regression and support vector machine both produce valid results by using bag of words and TF-IDF, random forest classifier is the only algorithm that consistently produces accurate results in all three feature extraction methods. The accuracy of support vector machine in one-hot encoding was the lowest even though the algorithm produced substantial results in the other two extraction procedures.