Research Article

A Novel Feature Selection Method for Classification of Medical Documents from Pubmed

by  S.Sagar Imambi, T.Sudha
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 26 - Issue 9
Published: July 2011
Authors: S.Sagar Imambi, T.Sudha
10.5120/3131-4315
PDF

S.Sagar Imambi, T.Sudha . A Novel Feature Selection Method for Classification of Medical Documents from Pubmed. International Journal of Computer Applications. 26, 9 (July 2011), 29-33. DOI=10.5120/3131-4315

                        @article{ 10.5120/3131-4315,
                        author  = { S.Sagar Imambi,T.Sudha },
                        title   = { A Novel Feature Selection Method for Classification of Medical Documents from Pubmed },
                        journal = { International Journal of Computer Applications },
                        year    = { 2011 },
                        volume  = { 26 },
                        number  = { 9 },
                        pages   = { 29-33 },
                        doi     = { 10.5120/3131-4315 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2011
                        %A S.Sagar Imambi
                        %A T.Sudha
                        %T A Novel Feature Selection Method for Classification of Medical Documents from Pubmed%T 
                        %J International Journal of Computer Applications
                        %V 26
                        %N 9
                        %P 29-33
                        %R 10.5120/3131-4315
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

The exponential growth of online repositories in medical science has led to the development of various text mining tool . Theses tools assist the users in analyzing text data stored in the online repositories like Pubmed and medline. The pubmed repositories are growing at the rate of 500000 articles per year. Classification of Medline documents becomes very complex due to high dimensionality of feature space. In this study we discussed how dimensionality is reduced. We study and compared various dimensionality reduction techniques at the preprocessing stage. We introduce a novel feature weighting scheme ‘GRW ‘ and proved that this schema improves classification accuracy. Our experimental results indicate that existing feature weighting methods has less accuracy rate when compared to GRW schema and tested on medical data set.

References
  • Fabrizio Sebastiani, Macine learning in Automated text categorization ,ACM Computing Surveys, VOL34,No 1(2002), pp 1-47
  • J Novovicova et al , Feature selection using Improved Mutual Information for Text classification’ ,SSPR & SPR(2004),pp 1010-1017
  • K.Perpinani Why IDF? ,In NAACL 01,Second meeting of the North American Chapter of the Association of Computational Linguistics on Language Technologies (2001), pp 1-8
  • Lecture 2,More Similarity searching Multidimensional scaling 36-350,Data mining ,2009.
  • L.Song A. Smola et al ,Supervsied Feature Selection via dependence estimation , In International conference on Machine Learning 2007
  • Ng et al ,Examining the role of Linguistic Knowledge sources in the automatic identification and classification Reviews, In proceedings of COLING /ACL,2006.
  • Robertson et al, Understanding IDF on theoretical arguments for IDF ,Journal of Documentation ,5:503-520,2004
  • Ronen Feldman, James Sange, The Text mining Handbook, Cambridge University Press(2007).
  • S.Sagar Imambi, T.Sudha - A Unified frame work for searching Digital libraries Using Document Clustering –International Journal of Computational Mathematical ideas Vol 2-No1-(2010) ,pp 28-32
  • Ranjit Abraham et al, Medical Data mining with a new algorithm for Feature selection and Navie Bayesian Classification IEEE 10th International Conference on Information Technology, 2007.
  • S.Sagar Imambi, T.Sudha-Clinical Decision Support System for Heart Patients-International Journal of Computer Science, System Engineering and Information Technology, Vol 2-No2. (2009), pp 165-169
  • Shoushan Li et al , ‘A frame work of feature Selection Methods for Text categorization’ ,Proceedings of 47th Annual meeting of ACL & 4th ICCNLP of AFNLP (2009), pp 692-700.
  • S.Sagar Imambi, T.Sudha- Classification of Medline documents using Global Relevant Weighing Schema’, International Journal of computer Applications 16(3) February 2011, pp 45–48
  • Sima C and Dougherty E ‘What should be expected from Feature selection in small sample settings ,Bio Informatics 22 (2006), pp 2430-2436
  • S.Sagar Imambi, T.Sudha -.Building Classification System to Predict Risk factors of Diabetic Retinopathy Using Text mining - International Journal on Computer Science and Engineering Vol. 02, No. 07 (2010) ,pp 2309-2312
  • Uğuz H.,A hybrid system based on information gain and principal component analysis for the classification of transactional Doppler signals, Department of Computer Engineering, Selçuk University, Konya, Turkey., 2011
  • Yang.y & Pedersen J.O, A comparative study on Feature Selection in Text categorization , 14th Proceedings of 14th International conference on Machine learning 1997.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Document Classification Feature Selection Pubmed Text mining

Powered by PhDFocusTM