Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey

Dinesh Kumar; Gurpreet Singh Josan

Research Article

Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey

by Dinesh Kumar, Gurpreet Singh Josan

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 6 - Issue 5

Published: September 2010

Authors: Dinesh Kumar, Gurpreet Singh Josan

10.5120/1078-1409

PDF

Dinesh Kumar, Gurpreet Singh Josan . Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey. International Journal of Computer Applications. 6, 5 (September 2010), 1-9. DOI=10.5120/1078-1409

                        @article{ 10.5120/1078-1409,
                        author  = { Dinesh Kumar,Gurpreet Singh Josan },
                        title   = { Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey },
                        journal = { International Journal of Computer Applications },
                        year    = { 2010 },
                        volume  = { 6 },
                        number  = { 5 },
                        pages   = { 1-9 },
                        doi     = { 10.5120/1078-1409 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2010
                        %A Dinesh Kumar
                        %A Gurpreet Singh Josan
                        %T Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey%T 
                        %J International Journal of Computer Applications
                        %V 6
                        %N 5
                        %P 1-9
                        %R 10.5120/1078-1409
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

The problem of tagging in natural language processing is to find a way to tag every word in a text as a particular part of speech, e.g., proper pronoun. POS tagging is a very important preprocessing task for language processing activities. This paper reports about the Part of Speech (POS) taggers proposed for various Indian Languages like Hindi, Punjabi, Malayalam, Bengali and Telugu. Various part of speech tagging approaches like Hidden Markov Model (HMM), Support Vector Model (SVM), Rule based approaches, Maximum Entropy (ME) and Conditional Random Field (CRF) have been used for POS tagging. Accuracy is the prime factor in evaluating any POS tagger so the accuracy of every proposed tagger is also discussed in this paper.

References

Aniket Dalal, Kumar Nagaraj, Uma Sawant and Sandeep Shelke, “Hindi Part-of-Speech Tagging and Chunking: A Maximum Entropy Approach”, In Proceeding of the NLPAI Machine Learning Competition, 2006.
Antony P.J, Santhanu P Mohan, Soman K.P,”SVM Based Part of Speech Tagger for Malayalam”, IEEE International Conference on Recent Trends in Information, Telecommunication and Computing, pp. 339-341, 2010
Agarwal Himashu, Amni Anirudh,” Part of Speech Tagging and Chunking with Conditional Random Fields” in the proceedings of NLPAI Contest, 2006
Brants, TnT – A statistical part-of-speech tagger. In Proc. of the 6th Applied NLP Conference, pp. 224-231, 2000
Cutting, J. Kupiec, J. Pederson and P. Sibun, A practical part-of-speech tagger. In Proc. of the 3rd Conference on Applied NLP, pp. 133-140, 1992
Dermatas and K. George, Automatic stochastic tagging of natural language texts. Computational Linguistics, 21(2): 137-163, 1995
Ekbal, Asif, and S. Bandyopadhyay,“Lexicon Development and POS tagging using a Tagged Bengali News Corpus”, In Proc. of FLAIRS-2007, Florida, 261-263, 2007
Ekbal, Asif, Haque, R. and S. Bandyopadhyay, “Named Entity Recognition in Bengali: A Conditional Random Field Approach”, In Proc. of 3rd IJCNLP, 51-55, 2008
Ekbal, A. Bandyopadhyay, S., “Part of Speech Tagging in Bengali Using Support Vector Machine”, ICIT- 08, IEEE International Conference on Information Technology, pp. 106-111, 2008
E. Dermatas and K. George, Automatic stochastic tagging of Natural language texts, Computational Linguistics, 21(2): 137-163, 1995
Ekbal Asif, et.al, “Bengali Part of Speech Tagging using Conditional Random Field” in Proceedings of the 7th International Symposium of Natural Language Processing (SNLP-2007), Pattaya, Thailand, 13-15 December 2007, pp.131-136
Gurpreet Singh, “Development of Punjabi Grammar Checker, Phd. Dissertation, 2008
Jurafsky D and Marting J H, Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Pearson Education Series 2002
James Allen, Natural Language Understanding, Benjamin/ Cummings Publishing Company, 1995
Jes´us Gim´enez and Llu´ıs M`arquez., SVMTtool:Technical manual v1.3, August 2006
John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conf. on Machine Learning, pages 282–289.Morgan Kaufmann, San Francisco, CA.
Kudo, T and Matsumoto, “Chunking with Support Vector Machines”, In Proc. of NAACL, 192-199, 2001.
Lafferty, J., McCallum, A., and Pereira, F., “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”, In Proc. of the 18th ICML’01, 282- 289, 2001.
Linda Van Guilder (1995) Automated Part of Speech Tagging: A Brief Overview Handout for LING361, Fall 1995 Georgetown University
Manju K., Soumya S., Sumam Mary Idicula, "Development of a POS Tagger for Malayalam - An Experience," artcom, pp.709-713, 2009 International Conference on Advances in Recent Technologies in Communication and Computing, 2009
Manish Shrivastava and Pushpak Bhattacharyya, Hindi POS Tagger Using Naive Stemming: Harnessing Morphological Information Without Extensive Linguistic Knowledge, International Conference on NLP (ICON08), Pune, India, December, 2008 Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2008
PVS Avinesh, G Karthik, ”Part-Of-Speech Tagging and Chunking using Conditional Random Fields and Transformation Based Learning” in the proceedings of NLPAI Contest, 2006
Ratnaparkhi, A., “A Maximum Entropy Part of Speech Tagger”, In Proc. of the EMNLP Conference, 133-142, 1996
RamaSree, R.J, Kusuma Kumari, P., “Combining Pos Taggers For Improved Accuracy To Create Telugu Annotated Texts For Information Retrieval”, 2007, Available at http://www.ulib.org/conference/2007/RamaSree.pdf
Sumam Mary Idicula and Peter S David, A Morphological processor for Malayalam Language, South Asia Research, SAGE Publications, 2007
Sandipan Dandapat, Sudeshna Sarkar, Anupam Basu,” Automatic Part-of-Speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario”, Proceedings of the Association for Computational Linguistic, pp 221-224, 2007
S. Singh , K. Gupta , M. Shrivastava and P. Bhattacharya, “Morphological Richness Offsets Resource Demand-Experiences in Constructing a POS Tagger for Hindi”, In Proc. of COLING/ACL, 779-786, 2006
Singh Mandeep, Lehal Gurpreet, and Sharma Shiv, 2008. ”A Part-of-Speech Tagset for Grammar Checking of Punjabi”, published in The Linguistic Journal, Vol 4, Issue 1, pp 6-22
Smriti Singh, et.al,” Morphological Richness Offsets Resource Demand- Experiences in Constructing a POS Tagger for Hindi”, in the proceedings of COLING/ACL, pp. 779-786, 2006
http://en.wikipedia.org/wiki/Malayalam
http://www.bangla-online.info/PromotionalSite/Bangla Language/IntroductionOfBanglaLanguage.htm
http://en.wikipedia.org/wiki/Punjabi_grammar
http://en.wikipedia.org/wiki/Punjabi_language
http://en.wikipedia.org/wiki/Telugu_language

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

HMM Tagging Stochastic Tagset Finite State Automata Suffix Prefix Support Vector Machines Stemming Maximum Entropy Corpora Tags Morphology