International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 106 - Issue 3 |
Published: November 2014 |
Authors: Saeed Raheel |
![]() |
Saeed Raheel . Feature Selection and the Preservation of Infrequent and Highly Significant Attributes in the Context of Arabic Text Mining. International Journal of Computer Applications. 106, 3 (November 2014), 31-36. DOI=10.5120/18503-9572
@article{ 10.5120/18503-9572, author = { Saeed Raheel }, title = { Feature Selection and the Preservation of Infrequent and Highly Significant Attributes in the Context of Arabic Text Mining }, journal = { International Journal of Computer Applications }, year = { 2014 }, volume = { 106 }, number = { 3 }, pages = { 31-36 }, doi = { 10.5120/18503-9572 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2014 %A Saeed Raheel %T Feature Selection and the Preservation of Infrequent and Highly Significant Attributes in the Context of Arabic Text Mining%T %J International Journal of Computer Applications %V 106 %N 3 %P 31-36 %R 10.5120/18503-9572 %I Foundation of Computer Science (FCS), NY, USA
Effective feature selection is a key component for building an efficient automatic document classifier. We regularly encounter in the Arabic literature- especially the scientific one- infrequent non-Arabic words that are eliminated by practice during the pre-processing phase. Although infrequent, those words are highly pertinent to their documents and, thus, can contribute to build a more efficient classification model and enforce the subjectivity of the decision taken by the classifier. Therefore, we propose in this paper four different feature selection solutions that allow both preserving a maximum number of those words and getting satisfactory classification accuracy.