Research Article

Choosing Shape Features by means of Genetic Algorithms for Gylph-clustering of Historical Documents

by  Jan-Hendrik Worch, Bjoern Gottfried
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 102 - Issue 3
Published: September 2014
Authors: Jan-Hendrik Worch, Bjoern Gottfried
10.5120/17792-8585
PDF

Jan-Hendrik Worch, Bjoern Gottfried . Choosing Shape Features by means of Genetic Algorithms for Gylph-clustering of Historical Documents. International Journal of Computer Applications. 102, 3 (September 2014), 1-6. DOI=10.5120/17792-8585

                        @article{ 10.5120/17792-8585,
                        author  = { Jan-Hendrik Worch,Bjoern Gottfried },
                        title   = { Choosing Shape Features by means of Genetic Algorithms for Gylph-clustering of Historical Documents },
                        journal = { International Journal of Computer Applications },
                        year    = { 2014 },
                        volume  = { 102 },
                        number  = { 3 },
                        pages   = { 1-6 },
                        doi     = { 10.5120/17792-8585 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2014
                        %A Jan-Hendrik Worch
                        %A Bjoern Gottfried
                        %T Choosing Shape Features by means of Genetic Algorithms for Gylph-clustering of Historical Documents%T 
                        %J International Journal of Computer Applications
                        %V 102
                        %N 3
                        %P 1-6
                        %R 10.5120/17792-8585
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

The solution for a feature selection problem is presented in the field of document image processing. The choice of shape features for describing glyphs of historical documents is a non-trivial task since the variations of glyphs in different documents is innumerable. Hence, the manual selection of shape features would be a cumbersome task. To select a subset of features from a given set a genetic algorithm is used which optimises the result of a clustering process by x-means. The result of x-means is evaluated by using different quality measures. The optimisation methodology is illustrated within a case study, in which the selection of an appropriate set of features is a crucial part of the system. The intended application supports a user who is transcribing historical documents by showing him similar occurrences of a given glyph.

References
  • Die Grenzboten, 28. Jahrgang, 2. Semester 1. Band, 1869. Scan 27 von der Staats- und Universit¨atsbibliothek Bremen.
  • SBPK Berlin, Philllipps 1870, fol. 11v, 1870.
  • W. Burger and M. J. Burge. Principles of digital image processing: Core algorithms. Springer, London, 2009.
  • D. Goldberg and K. Deb. A comparative analysis of selection schemes used in genetic algorithms. In G. Rawlins, editor, Foundations of Genetic Algorithms, pages 69–93. Morgan- Kaufmann, 1991.
  • R. C. Gonzalez and R. E. Woods. Digital image processing. Addison-Wesley, Reading, Mass. , [3. ed. ] reprint. with corr. edition, 1992.
  • B. Gottfried. Qualitative similarity measures - the case of two-dimensional outlines. Computer Vision and Image Understanding, 110(1):117–133, 2008.
  • B. Gottfried. Representing Material Objects by Qualitative Spatial Representations. Universit¨at Bremen, 2008. Unpublished Habilitation.
  • B. Gottfried, A. Schuldt, and O. Herzog. Extent, extremum, and curvature: Qualitative numeric features for efficient shape retrieval. In Joachim Hertzberg, Michael Beetz, and Roman Englert, editors, KI 2007: Advances in Artificial Intelligence, volume 4667 of Lecture Notes in Computer Science, pages 308–322. Springer Berlin / Heidelberg, 2007.
  • T. K. Ho. Random decision forests. In Proceedings of the second International Conference on Document Analysis and Recognition, pages 278–282, 1995.
  • T. K. Ho and H. S. Baird. Perfect metrics. In Proceedings of the second International Conference on Document Analysis and Recognition, pages 593–597, 1993.
  • T. K. Ho and H. S. Baird. Large-scale simulation studies in image pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(10):1067–1079, 1997.
  • J. Holland. Adaption in Natural and Artificial Systems. University of Michigan Press, 1975.
  • M. -K. Hu. Visual pattern recognition by moment invariants. Information Theory, IRE Transactions on, 8(2):179–187, 1962.
  • J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. 5th Berkeley Symp. , volume 1, pages 281–297, 1967.
  • P. Merz. Memetic Algorithms for Combinatorial Optimization Problems. Dissertation, Universit¨at-Gesamthochschule Siegen, 2000.
  • S. Mori, C. Y. Suen, and K. Yamamoto. Historical review of ocr research and development. In Proceedings of the IEEE, volume 80, pages 1029–1058, July 1992.
  • D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In Proc. 17th Int. Conf. Machine Learning, pages 727–734, 2000.
  • T. H. Reiss. The revised fundamental theorem of moment invariants. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8):830–834, August 1991.
  • A. Schuldt, B. Gottfried, and O. Herzog. Towards the visualisation of shape features the scope histogram. In C. Freksa, M. Kohlhase, and K. Schill, editors, KI 2006: Advances in Artificial Intelligence, volume 4314 of Lecture Notes in Computer Science, pages 289–301. Springer Berlin / Heidelberg, 2007.
  • G. Vamvakas, B. Gatos, and S. J. Perantonis. A novel feature extraction and classification methodology for the recognition of historical documents. In 10th International Conference on Document Analysis and Recognition, pages 491–495, 2009.
  • J. -H. Worch. VaBene – Validierung eines Benchmarks zur Evaluation von Formmerkmalen f¨ur Glyphen. Diploma thesis, Universit¨at Bremen, September 2011.
  • J. -H. Worch, M. Lawo, and B. Gottfried. Glyph spotting for mediaeval handwritings by template matching. In Proceedings of the 12th ACM symposium on Document engineering, DocEng '12, New York, NY, USA, 2012. ACM.
  • R. Xu and O. A. Di Guida. Comparison of sizing small particles using different technologies. Powder Technology, 132(2- 3):145 – 153, 2003.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Document Image Processing Genetic Algorithms Feature Selection Shape Descriptions Glyph Clustering X-Means

Powered by PhDFocusTM