Bangla Character Recognition for Android Devices

Aparajita Chowdhury; Abu Foysal; Shafiqul Islam

Research Article

Bangla Character Recognition for Android Devices

by Aparajita Chowdhury, Abu Foysal, Shafiqul Islam

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 136 - Issue 11

Published: February 2016

Authors: Aparajita Chowdhury, Abu Foysal, Shafiqul Islam

10.5120/ijca2016908566

PDF

Aparajita Chowdhury, Abu Foysal, Shafiqul Islam . Bangla Character Recognition for Android Devices. International Journal of Computer Applications. 136, 11 (February 2016), 13-19. DOI=10.5120/ijca2016908566

                        @article{ 10.5120/ijca2016908566,
                        author  = { Aparajita Chowdhury,Abu Foysal,Shafiqul Islam },
                        title   = { Bangla Character Recognition for Android Devices },
                        journal = { International Journal of Computer Applications },
                        year    = { 2016 },
                        volume  = { 136 },
                        number  = { 11 },
                        pages   = { 13-19 },
                        doi     = { 10.5120/ijca2016908566 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2016
                        %A Aparajita Chowdhury
                        %A Abu Foysal
                        %A Shafiqul Islam
                        %T Bangla Character Recognition for Android Devices%T 
                        %J International Journal of Computer Applications
                        %V 136
                        %N 11
                        %P 13-19
                        %R 10.5120/ijca2016908566
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

The main target of the project was to build an Android application that can extract text from any image that contains Bengali characters and convert it into an editable document. There were a few limitations in existing systems which could be improved further. To recognize more characters and joint letters, it was decided to work on decreasing the rate of error to preserve more texts. Tesseract (v3.03) was used to recognize the characters which utilizes Leptonica Image Processing library to process image and extracting data from the image. Joint letters, dangerous ambiguity and contrast issues were handled to increase efficiency. A record of the analyzed data and overall progress were kept for future scopes of improvement.

References

Smith, R. (2007). An Overview of the Tesseract OCR Engine. Proc. of 9th ICDAR 2007, Curitiba, Paraná, Brazil. (pp. 629-633). IEEE Explore.
Omee, F. Y., Himel, S. S., & Bikas, M. A. N. (2011). A Complete Workflow for Development of Bangla OCR. International Journal of Computer Applications, 21(9).
Hasnat, M. A., Habib, S. M. M., Khan, M. (2008). A High Performance Domain Specific OCR for Bangla Script. Novel Algorithms and Techniques in Telecommunications, Automation and Industrial Electronics. (pp. 174-178).
Zaman, S. M., & Islam, T. (2012). Application of Augmented Reality: Mobile Camera Based Bangla Text Detection and Translation. BRAC University.
Chowdhury, M., T., Islam, M., S., Bipu, B., H. (2015). Implementation of an Optical Character Recognizer (OCR) for Bengali language. BRAC University.
Rakshit, S., Ghosal, D., Das, T., Dutta, S., Basu, S. (2009). Development of a Multi-User Recognition Engine for Handwritten Bangla Basic Characters and Digits. Int. Conf. on Information Technology and Business Intelligence.
Hasnat, M., A., Chowdhury, M., R., Khan, M. (2009). Integrating Bangla script recognition support in Tesseract OCR. BRAC University.
Patel, C., Patel, A., & Patel, D. (2012). Optical Character Recognition by Open Source OCR Tool Tesseract: A Case Study. International Journal of Computer Applications, 55(10).
Aithal, P., K., Acharya, U., D., Siddalingaswamy, P., C. (2013). A Fast and Novel Skew Estimation Approach using Radon Transform. International Journal of Computer Information Systems and Industrial Management Applications (5). (pp. 337-344).
Pal, U., Chaudhuri, B., B. (1994). OCR in Bangla: an Indo-Bangladeshi language. Proc. of ICPR, Jerusalem, Israel. (pp. 269-274). IEEE Explore
Chaudhuri, B., B., Pal, U. (1997). An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi). Proc. of 4th ICDAR. Ulm, Germany. (pp. 1011-1015). IEEE Explore
Sarfraz, M., Zidouri, A., Shahab, S.A. (2005). A novel approach for skew estimation of document images in OCR system. International Conference on Computer Graphics, Imaging and Vision: New Trends. (pp. 175-180). IEEE Explore.
Gajoui, K., E., Ataa-Allah, F., Oumsis, M. (2015). Training Tesseract Tool for Amazigh OCR. Recent Researches in Applied Computer Science. Proc. of 15th International Conference on Applied Computer Science (ACS15), Konya, Turkey. (pp.172-179). WSEAS Press.
Banerjee, S. (2012). A Study on Tesseract Open Source Optical Character Recognition Engine. Jadavpur University. Retrieved December 13, 2015, from: http://dspace.jdvu.ac.in /handle/123456789/27793
Datta, S., Chaudhury, S., and Parthasarathy, G. (1992). On Recognition of Bengali Numerals with BackPropagation Learning. IEEE International Conference on Systems, Man and Cybernetics (pp. 94-99). IEEE Explore.
Abdullah, A., Khan, M. (2007). A Survey on Script Segmentation for Bangla OCR. BRAC University.
Manning, C., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass. MIT Press.
Arif, S., R. (2007). Bengali Character Recognition using Feature Extraction. BRAC University.
Hayder, K. (2007). Research Report on Bangla Lexicon. BRAC University.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Optical Character Recognition (OCR) Bangla language Android Tesseract Leptonica.