Research Article

Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey

by  Dinesh Kumar, Gurpreet Singh Josan
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 6 - Issue 5
Published: September 2010
Authors: Dinesh Kumar, Gurpreet Singh Josan
10.5120/1078-1409
PDF

Dinesh Kumar, Gurpreet Singh Josan . Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey. International Journal of Computer Applications. 6, 5 (September 2010), 1-9. DOI=10.5120/1078-1409

                        @article{ 10.5120/1078-1409,
                        author  = { Dinesh Kumar,Gurpreet Singh Josan },
                        title   = { Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey },
                        journal = { International Journal of Computer Applications },
                        year    = { 2010 },
                        volume  = { 6 },
                        number  = { 5 },
                        pages   = { 1-9 },
                        doi     = { 10.5120/1078-1409 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2010
                        %A Dinesh Kumar
                        %A Gurpreet Singh Josan
                        %T Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey%T 
                        %J International Journal of Computer Applications
                        %V 6
                        %N 5
                        %P 1-9
                        %R 10.5120/1078-1409
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

The problem of tagging in natural language processing is to find a way to tag every word in a text as a particular part of speech, e.g., proper pronoun. POS tagging is a very important preprocessing task for language processing activities. This paper reports about the Part of Speech (POS) taggers proposed for various Indian Languages like Hindi, Punjabi, Malayalam, Bengali and Telugu. Various part of speech tagging approaches like Hidden Markov Model (HMM), Support Vector Model (SVM), Rule based approaches, Maximum Entropy (ME) and Conditional Random Field (CRF) have been used for POS tagging. Accuracy is the prime factor in evaluating any POS tagger so the accuracy of every proposed tagger is also discussed in this paper.

References
  • Aniket Dalal, Kumar Nagaraj, Uma Sawant and Sandeep Shelke, “Hindi Part-of-Speech Tagging and Chunking: A Maximum Entropy Approach”, In Proceeding of the NLPAI Machine Learning Competition, 2006.
  • Antony P.J, Santhanu P Mohan, Soman K.P,”SVM Based Part of Speech Tagger for Malayalam”, IEEE International Conference on Recent Trends in Information, Telecommunication and Computing, pp. 339-341, 2010
  • Agarwal Himashu, Amni Anirudh,” Part of Speech Tagging and Chunking with Conditional Random Fields” in the proceedings of NLPAI Contest, 2006
  • Brants, TnT – A statistical part-of-speech tagger. In Proc. of the 6th Applied NLP Conference, pp. 224-231, 2000
  • Cutting, J. Kupiec, J. Pederson and P. Sibun, A practical part-of-speech tagger. In Proc. of the 3rd Conference on Applied NLP, pp. 133-140, 1992
  • Dermatas and K. George, Automatic stochastic tagging of natural language texts. Computational Linguistics, 21(2): 137-163, 1995
  • Ekbal, Asif, and S. Bandyopadhyay,“Lexicon Development and POS tagging using a Tagged Bengali News Corpus”, In Proc. of FLAIRS-2007, Florida, 261-263, 2007
  • Ekbal, Asif, Haque, R. and S. Bandyopadhyay, “Named Entity Recognition in Bengali: A Conditional Random Field Approach”, In Proc. of 3rd IJCNLP, 51-55, 2008
  • Ekbal, A. Bandyopadhyay, S., “Part of Speech Tagging in Bengali Using Support Vector Machine”, ICIT- 08, IEEE International Conference on Information Technology, pp. 106-111, 2008
  • E. Dermatas and K. George, Automatic stochastic tagging of Natural language texts, Computational Linguistics, 21(2): 137-163, 1995
  • Ekbal Asif, et.al, “Bengali Part of Speech Tagging using Conditional Random Field” in Proceedings of the 7th International Symposium of Natural Language Processing (SNLP-2007), Pattaya, Thailand, 13-15 December 2007, pp.131-136
  • Gurpreet Singh, “Development of Punjabi Grammar Checker, Phd. Dissertation, 2008
  • Jurafsky D and Marting J H, Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Pearson Education Series 2002
  • James Allen, Natural Language Understanding, Benjamin/ Cummings Publishing Company, 1995
  • Jes´us Gim´enez and Llu´ıs M`arquez., SVMTtool:Technical manual v1.3, August 2006
  • John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conf. on Machine Learning, pages 282–289.Morgan Kaufmann, San Francisco, CA.
  • Kudo, T and Matsumoto, “Chunking with Support Vector Machines”, In Proc. of NAACL, 192-199, 2001.
  • Lafferty, J., McCallum, A., and Pereira, F., “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”, In Proc. of the 18th ICML’01, 282- 289, 2001.
  • Linda Van Guilder (1995) Automated Part of Speech Tagging: A Brief Overview Handout for LING361, Fall 1995 Georgetown University
  • Manju K., Soumya S., Sumam Mary Idicula, "Development of a POS Tagger for Malayalam - An Experience," artcom, pp.709-713, 2009 International Conference on Advances in Recent Technologies in Communication and Computing, 2009
  • Manish Shrivastava and Pushpak Bhattacharyya, Hindi POS Tagger Using Naive Stemming: Harnessing Morphological Information Without Extensive Linguistic Knowledge, International Conference on NLP (ICON08), Pune, India, December, 2008 Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2008
  • PVS Avinesh, G Karthik, ”Part-Of-Speech Tagging and Chunking using Conditional Random Fields and Transformation Based Learning” in the proceedings of NLPAI Contest, 2006
  • Ratnaparkhi, A., “A Maximum Entropy Part of Speech Tagger”, In Proc. of the EMNLP Conference, 133-142, 1996
  • RamaSree, R.J, Kusuma Kumari, P., “Combining Pos Taggers For Improved Accuracy To Create Telugu Annotated Texts For Information Retrieval”, 2007, Available at http://www.ulib.org/conference/2007/RamaSree.pdf
  • Sumam Mary Idicula and Peter S David, A Morphological processor for Malayalam Language, South Asia Research, SAGE Publications, 2007
  • Sandipan Dandapat, Sudeshna Sarkar, Anupam Basu,” Automatic Part-of-Speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario”, Proceedings of the Association for Computational Linguistic, pp 221-224, 2007
  • S. Singh , K. Gupta , M. Shrivastava and P. Bhattacharya, “Morphological Richness Offsets Resource Demand-Experiences in Constructing a POS Tagger for Hindi”, In Proc. of COLING/ACL, 779-786, 2006
  • Singh Mandeep, Lehal Gurpreet, and Sharma Shiv, 2008. ”A Part-of-Speech Tagset for Grammar Checking of Punjabi”, published in The Linguistic Journal, Vol 4, Issue 1, pp 6-22
  • Smriti Singh, et.al,” Morphological Richness Offsets Resource Demand- Experiences in Constructing a POS Tagger for Hindi”, in the proceedings of COLING/ACL, pp. 779-786, 2006
  • http://en.wikipedia.org/wiki/Malayalam
  • http://www.bangla-online.info/PromotionalSite/Bangla Language/IntroductionOfBanglaLanguage.htm
  • http://en.wikipedia.org/wiki/Punjabi_grammar
  • http://en.wikipedia.org/wiki/Punjabi_language
  • http://en.wikipedia.org/wiki/Telugu_language
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

HMM Tagging Stochastic Tagset Finite State Automata Suffix Prefix Support Vector Machines Stemming Maximum Entropy Corpora Tags Morphology

Powered by PhDFocusTM