|
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
| Volume 106 - Issue 3 |
| Published: November 2014 |
| Authors: Saeed Raheel |
10.5120/18503-9572
|
Saeed Raheel . Feature Selection and the Preservation of Infrequent and Highly Significant Attributes in the Context of Arabic Text Mining. International Journal of Computer Applications. 106, 3 (November 2014), 31-36. DOI=10.5120/18503-9572
@article{ 10.5120/18503-9572,
author = { Saeed Raheel },
title = { Feature Selection and the Preservation of Infrequent and Highly Significant Attributes in the Context of Arabic Text Mining },
journal = { International Journal of Computer Applications },
year = { 2014 },
volume = { 106 },
number = { 3 },
pages = { 31-36 },
doi = { 10.5120/18503-9572 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2014
%A Saeed Raheel
%T Feature Selection and the Preservation of Infrequent and Highly Significant Attributes in the Context of Arabic Text Mining%T
%J International Journal of Computer Applications
%V 106
%N 3
%P 31-36
%R 10.5120/18503-9572
%I Foundation of Computer Science (FCS), NY, USA
Effective feature selection is a key component for building an efficient automatic document classifier. We regularly encounter in the Arabic literature- especially the scientific one- infrequent non-Arabic words that are eliminated by practice during the pre-processing phase. Although infrequent, those words are highly pertinent to their documents and, thus, can contribute to build a more efficient classification model and enforce the subjectivity of the decision taken by the classifier. Therefore, we propose in this paper four different feature selection solutions that allow both preserving a maximum number of those words and getting satisfactory classification accuracy.