International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 101 - Issue 7 |
Published: September 2014 |
Authors: Diab Abuaiadah, Jihad El Sana, Walid Abusalah |
![]() |
Diab Abuaiadah, Jihad El Sana, Walid Abusalah . On the Impact of Dataset Characteristics on Arabic Document Classification. International Journal of Computer Applications. 101, 7 (September 2014), 31-38. DOI=10.5120/17701-8680
@article{ 10.5120/17701-8680, author = { Diab Abuaiadah,Jihad El Sana,Walid Abusalah }, title = { On the Impact of Dataset Characteristics on Arabic Document Classification }, journal = { International Journal of Computer Applications }, year = { 2014 }, volume = { 101 }, number = { 7 }, pages = { 31-38 }, doi = { 10.5120/17701-8680 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2014 %A Diab Abuaiadah %A Jihad El Sana %A Walid Abusalah %T On the Impact of Dataset Characteristics on Arabic Document Classification%T %J International Journal of Computer Applications %V 101 %N 7 %P 31-38 %R 10.5120/17701-8680 %I Foundation of Computer Science (FCS), NY, USA
This paper describes the impact of dataset characteristics on the results of Arabic document classification algorithms using TF-IDF representations. The experiments compared different stemmers, different categories and different training set sizes, and found that different dataset characteristics produced widely differing results, in one case attaining a remarkable 99% recall (accuracy). The use of a standard dataset would eliminate this variability and enable researchers to gain comparable knowledge from the published results.