International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 107 - Issue 11 |
Published: December 2014 |
Authors: Aadesh Neupane |
![]() |
Aadesh Neupane . Development of Nepali Character Database for Character Recognition based on Clustering. International Journal of Computer Applications. 107, 11 (December 2014), 42-46. DOI=10.5120/18799-0315
@article{ 10.5120/18799-0315, author = { Aadesh Neupane }, title = { Development of Nepali Character Database for Character Recognition based on Clustering }, journal = { International Journal of Computer Applications }, year = { 2014 }, volume = { 107 }, number = { 11 }, pages = { 42-46 }, doi = { 10.5120/18799-0315 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2014 %A Aadesh Neupane %T Development of Nepali Character Database for Character Recognition based on Clustering%T %J International Journal of Computer Applications %V 107 %N 11 %P 42-46 %R 10.5120/18799-0315 %I Foundation of Computer Science (FCS), NY, USA
Character Recognition tasks requires large set of reliable dataset to apply recognition algorithms and generate efficient models out of them. In case of Nepali language, no such character dataset exists for character recognition research, at least in the public domain. Nepali language has 36 consonant characters, 12 vowels character and each vowel character can modify each consonant characters. In this regard, there can be total of 446 characters including Nepali numeric characters. So, manually creating dataset for Nepali characters requires tons of effort, cost and time. In this paper, an elegant way of creating Nepali character dataset using semi-supervised clustering approach is described which minimizes effort and time. Also, optimization is done on existing segmentation algorithm [1] to segment Nepali characters for both handwritten and scanned Nepali text. Complex features are extracted from these segmented characters by applying Discrete Cosine Transform and Wavelet transform. Thus, these extracted features are used to create database of Nepali characters using phash and k-means cluster. Presently, the database contains 38,493 characters distributed among 52 different clusters.