Development of Nepali Character Database for Character Recognition based on Clustering

Aadesh Neupane

Research Article

Development of Nepali Character Database for Character Recognition based on Clustering

by Aadesh Neupane

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 107 - Issue 11

Published: December 2014

Authors: Aadesh Neupane

10.5120/18799-0315

PDF

Aadesh Neupane . Development of Nepali Character Database for Character Recognition based on Clustering. International Journal of Computer Applications. 107, 11 (December 2014), 42-46. DOI=10.5120/18799-0315

                        @article{ 10.5120/18799-0315,
                        author  = { Aadesh Neupane },
                        title   = { Development of Nepali Character Database for Character Recognition based on Clustering },
                        journal = { International Journal of Computer Applications },
                        year    = { 2014 },
                        volume  = { 107 },
                        number  = { 11 },
                        pages   = { 42-46 },
                        doi     = { 10.5120/18799-0315 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2014
                        %A Aadesh Neupane
                        %T Development of Nepali Character Database for Character Recognition based on Clustering%T 
                        %J International Journal of Computer Applications
                        %V 107
                        %N 11
                        %P 42-46
                        %R 10.5120/18799-0315
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Character Recognition tasks requires large set of reliable dataset to apply recognition algorithms and generate efficient models out of them. In case of Nepali language, no such character dataset exists for character recognition research, at least in the public domain. Nepali language has 36 consonant characters, 12 vowels character and each vowel character can modify each consonant characters. In this regard, there can be total of 446 characters including Nepali numeric characters. So, manually creating dataset for Nepali characters requires tons of effort, cost and time. In this paper, an elegant way of creating Nepali character dataset using semi-supervised clustering approach is described which minimizes effort and time. Also, optimization is done on existing segmentation algorithm [1] to segment Nepali characters for both handwritten and scanned Nepali text. Complex features are extracted from these segmented characters by applying Discrete Cosine Transform and Wavelet transform. Thus, these extracted features are used to create database of Nepali characters using phash and k-means cluster. Presently, the database contains 38,493 characters distributed among 52 different clusters.

References

Bal Krishna Bal and Prajwal Rupakheti, Research Report on the Nepali OCR, PANL10n Admin Reports, September 2009
Eugene Borovikov, A survey of modern optical character recognition techniques (DRAFT), February 2004
Vijay Kumar and Pankaj K Sengar, Segmentation of Printed Text in Devanagari Script and Gurmukhi Script, International Journal of Computer Applications, vol. 3, No. 8, pp 30–33, June 2010.
Mitrakshi B. Patil ,and Vaibhav Narawade, Recognition of Handwritten Devnagari Characters through Segmentation and Artificial Neural Networks, Internation Journal of Engineering Research & Technology(IJERT), vol. 1, No. 6, August 2012.
Veena Bansal, and R. M. K. Sinha, Segmentation of Touching and Fused Devanagari Characters, Indian Institute of Technology, Kanpur
Ratnashil N Khobragade1, Dr. Nitin A. Koli and Mahendra S Makesar, A Survey of Recognition of Devnagari Script, International Journal of Computer Applications and Information Technology (IJCAIT), vol. 2, No. 1, January 2013.
Richard G. Casey and Eric Lecolinet, A survey of Methods and Strategies in Character Segmentation, IEEE Transaction on PAMI, pp 690-706, 1996.
Mudit Agrawal, Huanfeng Ma, and David Doermann, Generalization of Hindi OCR Using Adaptive Segmentation and Font Files, 2009.
Sanjeev Maharjan, MPP Nepali OCR Report, PANL10n Admin Reports, July 2010 .
Anilkumar N Holambe, Ravindra C Thool, Combining Multiple Feature Extraction Technique and Classifiers for Increasing Accuracy for Devanagari OCR, IJSCE, Vol. 3, No. 4, September 2013.
Sheetal Dabra, Sunil Agrawal, and Rama Krishna Challa, A Novel Feature Set for Recognition of Similar Shaped Handwritten Hindi Characters Using Machine Learning, CCSEA 2011, Vol. 02, pp. 25-35, 2011.
Andrew B. Watson, Image Compression Using the Discrete Cosine Transform, Mathematical Journal, Vol. 4, No. 1, pp. 81-88, 1994.
Bian Yang,Fan Gu, and XiaMu Niu Image, Perceptual Hashing, IIH-MSP, pp. 167-172, December 2006.
Christoph Zauner, Implementation and Benchmarking of Perceptual Image Hash Functions, Ph. D. Thesis, University of Sichere Informationsssteme, Hagenberg, July 2010.
Lewis A. S and Knowles G, Image Compression using the 2-D wavelet transform, Image Processing, IEEE Transactions, Vol. 1, No. 2, pp. 244-250.
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann ,and Ian H. Witten, The WEKA data mining software: an update, SIGKDD Explorations, Vol. 11, No. 1, pp. 10-18.
Jacob Goldberger, Shiri Gordon, and Hayit Greenspan, Unsupervised Image-Set Clustering Using an Information Theoretic Framework, IEEE Transactions on Image Processing, Vol. 15, No. 2, pp. 449-458, February 2006.
Venkat Rasagna, Anand Kumar, C. V. Jawahar, and R. Manmatha, Robust Recognition of Documents by Fusing Results of Word Clusters,
John W. Eaton and David Bateman and Soren Hauberg, GNU Octave version 3. 0. 1 manual: a high-level interactive language for numerical computations, CreateSpace Independent Publishing Platform, 2009 .
A. P. Dempster; N. M. Laird; D. B. Rubin, Maximum Likelihood from Incomplete data via the EM Algorithm, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, pp. 1-38, 1997.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Nepali Character Segmentation Nepali Character Database Nepali Character Recognition Nepali Character Clustering.