Research Article

Sentence Boundary Detection in Kannada Language

by  Deepamala. N, Ramakanth Kumar. P
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 39 - Issue 9
Published: February 2012
Authors: Deepamala. N, Ramakanth Kumar. P
10.5120/4852-7124
PDF

Deepamala. N, Ramakanth Kumar. P . Sentence Boundary Detection in Kannada Language. International Journal of Computer Applications. 39, 9 (February 2012), 38-41. DOI=10.5120/4852-7124

                        @article{ 10.5120/4852-7124,
                        author  = { Deepamala. N,Ramakanth Kumar. P },
                        title   = { Sentence Boundary Detection in Kannada Language },
                        journal = { International Journal of Computer Applications },
                        year    = { 2012 },
                        volume  = { 39 },
                        number  = { 9 },
                        pages   = { 38-41 },
                        doi     = { 10.5120/4852-7124 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2012
                        %A Deepamala. N
                        %A Ramakanth Kumar. P
                        %T Sentence Boundary Detection in Kannada Language%T 
                        %J International Journal of Computer Applications
                        %V 39
                        %N 9
                        %P 38-41
                        %R 10.5120/4852-7124
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Sentence Boundary Detection is a pre-processing step for any Natural Language Processing application. Various algorithms have been used to achieve Sentence Boundary Detection or Disambiguation in different languages. In this paper, a rule based method is proposed and tested to achieve Sentence Boundary Detection for Kannada Language. Kannada being a grammatically rich Indian language is analyzed based on semantics and tested with a 227K bytes corpus. The code is written in C using wide characters, with support for Unicode. Results showed 99.2% success in detecting sentence boundary.

References
  • Manning, C.D. and. Schütze., H. 2002. Foundations of statistical natural language processing. The MIT Press, London.
  • J. Reynar, and Ratnaparkhi. A. 1997. A Maximum Entropy Approach to Identifying Sentence Boundaries, in Proceedings of the Fifth Conference on Applied Natural Language Processing, Washington D.C, pp. 16-19.
  • Palmer, D.D. and Hearst, M.A..1997. Adaptive multilingual sentence boundary disambiguation. Computational Linguistics 23 241–267
  • Mikheev, A. 2000. Tagging Sentence Boundaries. In: Proceedings of the NAACL, Seattle, pp 264-271.
  • T. Kiss and Strunk, J. 2006. Unsupervised multilingual sentence boundary detection. Computational Linguistics, 32(4):485–525.
  • Walker, Daniel J., David E. Clements, Maki, Darwin and Jan, W. Amtrup. 2001. Sentence boundary detection: a comparison of paradigms for improving MT quality. In: Proceedings of the MT Summit VIII, Santiago de Compostela, Spain.
  • Akita, Y. 2006. Sentence Boundary Detection of Spontaneous Japanese Using Statistical Language Model and Support Vector Machines. In: Proceedings of. Interspeech-ICSLP, Pittsburgh, PA.
  • Singh, Preetam, Negi, Rauthan M.M.S and Dhami, H.S. 2010. Sentence Boundary Disambiguation: a User Friendly Approach. IJCA. Vol, 7-No.8.
  • Mona Parakh, Rajesha N. and Ramya M. 2011. Sentence Boundary Disambiguation in Kannada Texts, Language in India. www.languageinindia.com. 11:5 May 2011 Special Volume: Problems of Parsing in Indian Languages, pp. 17- 19.
  • Gillick, D. 2009. Sentence Boundary Detection and the Problem with the U.S. In: Proceedings of the NAACL HLT: Short Papers, Boulder, Colorado.
  • Agarwal N., Ford K., and Shneider M., Sentence Boundary Detection using a MaxEnt Classifier. citeseerx.ist.psu.edu
  • Wang H. and Huang Y. 2003. Bondec - A sentence Boundary Detector. CS224N Project, Stanford, 2003
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Sentence Boundary Detection Verb Suffix Abbreviation

Powered by PhDFocusTM