Research Article

Some Investigations on Machine Learning Techniques for Automated Text Categorization

by  Bhagirath Prajapati, Sanjay Garg, N C Chauhan
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 71 - Issue 3
Published: June 2013
Authors: Bhagirath Prajapati, Sanjay Garg, N C Chauhan
10.5120/12340-8617
PDF

Bhagirath Prajapati, Sanjay Garg, N C Chauhan . Some Investigations on Machine Learning Techniques for Automated Text Categorization. International Journal of Computer Applications. 71, 3 (June 2013), 32-36. DOI=10.5120/12340-8617

                        @article{ 10.5120/12340-8617,
                        author  = { Bhagirath Prajapati,Sanjay Garg,N C Chauhan },
                        title   = { Some Investigations on Machine Learning Techniques for Automated Text Categorization },
                        journal = { International Journal of Computer Applications },
                        year    = { 2013 },
                        volume  = { 71 },
                        number  = { 3 },
                        pages   = { 32-36 },
                        doi     = { 10.5120/12340-8617 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2013
                        %A Bhagirath Prajapati
                        %A Sanjay Garg
                        %A N C Chauhan
                        %T Some Investigations on Machine Learning Techniques for Automated Text Categorization%T 
                        %J International Journal of Computer Applications
                        %V 71
                        %N 3
                        %P 32-36
                        %R 10.5120/12340-8617
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

The automated categorization (classification) of texts into predefined categories is one of the widely explored fields of research in text mining. Now-a-days, availability of digital data is very high, and to manage them in predefined categories has become a challenging task. Machine learning technique is an approach by which we can train automated classifier to classify the documents with minimum human assistance. This paper discusses the Naïve Bayes, Rocchio, k-Nearest Neighborhood and Support Vector Machine methods within machine learning paradigm for automated text categorization of given documents in predefined categories.

References
  • Manning, C. D. , Raghavan, P. , Chütze, H. 2009. An Introduction to information retrieval, Chapter 1: Boolean retrieval, page 1, Cambridge University Press.
  • Rijsbergen, C. J. V. 1979. Information retrieval: Chapter 2: Automatic Text Analysis, Butterworth-Heinemann, 2nd edition.
  • Sebastian, F. , Ricerche, C. N. 2002. "Machine learning in automated text classification", ACM Computing Surveys, Vol. 34, No. 1, pp. 1-47.
  • Nilsson, N. J. 1996. Introduction to machine learning, Chap 01: Preliminaries, Draft of Incomplete.
  • Salton, G. , Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), pages. 513–523.
  • Guo, G. , Wang, H. , Bell, D. , Bi, Y. , and Greer, K. 2006. "Using k-NN model-based approach for automatic text categorization", Soft Computing-A Fusion of Foundations, Methodologies and Applications.
  • Manning, C. , Raghvan, P. , and Schutze, H. 2008. "Text classification and Naïve Bayes", Chapter in Introduction to Information Retrieval, Cambridge University Press.
  • Yang, Y. 1994. "Expert network: effective and efficient learning from human decisions in text categorization and retrieval", In Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, Ireland, pages. 13–22.
  • Joachims, T. 1999. "Transductive inference for text classification using support vector machines", ICML-99, Pages 200–209.
  • Yang, Y. , Liu, X. 1999. "A re-examination of text categorization methods", SIGIR-99, Page 42–49.
  • Vang, K. : 20 news group dataset, http://people. csail. mit. edu. /Jrennie/20newsgroup.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Machine learning Text categorization.

Powered by PhDFocusTM