|
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
| Volume 101 - Issue 7 |
| Published: September 2014 |
| Authors: Diab Abuaiadah, Jihad El Sana, Walid Abusalah |
10.5120/17701-8680
|
Diab Abuaiadah, Jihad El Sana, Walid Abusalah . On the Impact of Dataset Characteristics on Arabic Document Classification. International Journal of Computer Applications. 101, 7 (September 2014), 31-38. DOI=10.5120/17701-8680
@article{ 10.5120/17701-8680,
author = { Diab Abuaiadah,Jihad El Sana,Walid Abusalah },
title = { On the Impact of Dataset Characteristics on Arabic Document Classification },
journal = { International Journal of Computer Applications },
year = { 2014 },
volume = { 101 },
number = { 7 },
pages = { 31-38 },
doi = { 10.5120/17701-8680 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2014
%A Diab Abuaiadah
%A Jihad El Sana
%A Walid Abusalah
%T On the Impact of Dataset Characteristics on Arabic Document Classification%T
%J International Journal of Computer Applications
%V 101
%N 7
%P 31-38
%R 10.5120/17701-8680
%I Foundation of Computer Science (FCS), NY, USA
This paper describes the impact of dataset characteristics on the results of Arabic document classification algorithms using TF-IDF representations. The experiments compared different stemmers, different categories and different training set sizes, and found that different dataset characteristics produced widely differing results, in one case attaining a remarkable 99% recall (accuracy). The use of a standard dataset would eliminate this variability and enable researchers to gain comparable knowledge from the published results.