Research Article

Bangla Character Recognition for Android Devices

by  Aparajita Chowdhury, Abu Foysal, Shafiqul Islam
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 136 - Issue 11
Published: February 2016
Authors: Aparajita Chowdhury, Abu Foysal, Shafiqul Islam
10.5120/ijca2016908566
PDF

Aparajita Chowdhury, Abu Foysal, Shafiqul Islam . Bangla Character Recognition for Android Devices. International Journal of Computer Applications. 136, 11 (February 2016), 13-19. DOI=10.5120/ijca2016908566

                        @article{ 10.5120/ijca2016908566,
                        author  = { Aparajita Chowdhury,Abu Foysal,Shafiqul Islam },
                        title   = { Bangla Character Recognition for Android Devices },
                        journal = { International Journal of Computer Applications },
                        year    = { 2016 },
                        volume  = { 136 },
                        number  = { 11 },
                        pages   = { 13-19 },
                        doi     = { 10.5120/ijca2016908566 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2016
                        %A Aparajita Chowdhury
                        %A Abu Foysal
                        %A Shafiqul Islam
                        %T Bangla Character Recognition for Android Devices%T 
                        %J International Journal of Computer Applications
                        %V 136
                        %N 11
                        %P 13-19
                        %R 10.5120/ijca2016908566
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

The main target of the project was to build an Android application that can extract text from any image that contains Bengali characters and convert it into an editable document. There were a few limitations in existing systems which could be improved further. To recognize more characters and joint letters, it was decided to work on decreasing the rate of error to preserve more texts. Tesseract (v3.03) was used to recognize the characters which utilizes Leptonica Image Processing library to process image and extracting data from the image. Joint letters, dangerous ambiguity and contrast issues were handled to increase efficiency. A record of the analyzed data and overall progress were kept for future scopes of improvement.

References
  • Smith, R. (2007). An Overview of the Tesseract OCR Engine. Proc. of 9th ICDAR 2007, Curitiba, Paraná, Brazil. (pp. 629-633). IEEE Explore.
  • Omee, F. Y., Himel, S. S., & Bikas, M. A. N. (2011). A Complete Workflow for Development of Bangla OCR. International Journal of Computer Applications, 21(9).
  • Hasnat, M. A., Habib, S. M. M., Khan, M. (2008). A High Performance Domain Specific OCR for Bangla Script. Novel Algorithms and Techniques in Telecommunications, Automation and Industrial Electronics. (pp. 174-178).
  • Zaman, S. M., & Islam, T. (2012). Application of Augmented Reality: Mobile Camera Based Bangla Text Detection and Translation. BRAC University.
  • Chowdhury, M., T., Islam, M., S., Bipu, B., H. (2015). Implementation of an Optical Character Recognizer (OCR) for Bengali language. BRAC University.
  • Rakshit, S., Ghosal, D., Das, T., Dutta, S., Basu, S. (2009). Development of a Multi-User Recognition Engine for Handwritten Bangla Basic Characters and Digits. Int. Conf. on Information Technology and Business Intelligence.
  • Hasnat, M., A., Chowdhury, M., R., Khan, M. (2009). Integrating Bangla script recognition support in Tesseract OCR. BRAC University.
  • Patel, C., Patel, A., & Patel, D. (2012). Optical Character Recognition by Open Source OCR Tool Tesseract: A Case Study. International Journal of Computer Applications, 55(10).
  • Aithal, P., K., Acharya, U., D., Siddalingaswamy, P., C. (2013). A Fast and Novel Skew Estimation Approach using Radon Transform. International Journal of Computer Information Systems and Industrial Management Applications (5). (pp. 337-344).
  • Pal, U., Chaudhuri, B., B. (1994). OCR in Bangla: an Indo-Bangladeshi language. Proc. of ICPR, Jerusalem, Israel. (pp. 269-274). IEEE Explore
  • Chaudhuri, B., B., Pal, U. (1997). An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi). Proc. of 4th ICDAR. Ulm, Germany. (pp. 1011-1015). IEEE Explore
  • Sarfraz, M., Zidouri, A., Shahab, S.A. (2005). A novel approach for skew estimation of document images in OCR system. International Conference on Computer Graphics, Imaging and Vision: New Trends. (pp. 175-180). IEEE Explore.
  • Gajoui, K., E., Ataa-Allah, F., Oumsis, M. (2015). Training Tesseract Tool for Amazigh OCR. Recent Researches in Applied Computer Science. Proc. of 15th International Conference on Applied Computer Science (ACS15), Konya, Turkey. (pp.172-179). WSEAS Press.
  • Banerjee, S. (2012). A Study on Tesseract Open Source Optical Character Recognition Engine. Jadavpur University. Retrieved December 13, 2015, from: http://dspace.jdvu.ac.in /handle/123456789/27793
  • Datta, S., Chaudhury, S., and Parthasarathy, G. (1992). On Recognition of Bengali Numerals with BackPropagation Learning. IEEE International Conference on Systems, Man and Cybernetics (pp. 94-99). IEEE Explore.
  • Abdullah, A., Khan, M. (2007). A Survey on Script Segmentation for Bangla OCR. BRAC University.
  • Manning, C., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass. MIT Press.
  • Arif, S., R. (2007). Bengali Character Recognition using Feature Extraction. BRAC University.
  • Hayder, K. (2007). Research Report on Bangla Lexicon. BRAC University.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Optical Character Recognition (OCR) Bangla language Android Tesseract Leptonica.

Powered by PhDFocusTM