Research Article

A Complete Workflow for Development of Bangla OCR

by  Farjana Yeasmin Omee, Shiam Shabbir Himel, Md. Abu Naser Bikas
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 21 - Issue 9
Published: May 2011
Authors: Farjana Yeasmin Omee, Shiam Shabbir Himel, Md. Abu Naser Bikas
10.5120/2543-3483
PDF

Farjana Yeasmin Omee, Shiam Shabbir Himel, Md. Abu Naser Bikas . A Complete Workflow for Development of Bangla OCR. International Journal of Computer Applications. 21, 9 (May 2011), 1-6. DOI=10.5120/2543-3483

                        @article{ 10.5120/2543-3483,
                        author  = { Farjana Yeasmin Omee,Shiam Shabbir Himel,Md. Abu Naser Bikas },
                        title   = { A Complete Workflow for Development of Bangla OCR },
                        journal = { International Journal of Computer Applications },
                        year    = { 2011 },
                        volume  = { 21 },
                        number  = { 9 },
                        pages   = { 1-6 },
                        doi     = { 10.5120/2543-3483 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2011
                        %A Farjana Yeasmin Omee
                        %A Shiam Shabbir Himel
                        %A Md. Abu Naser Bikas
                        %T A Complete Workflow for Development of Bangla OCR%T 
                        %J International Journal of Computer Applications
                        %V 21
                        %N 9
                        %P 1-6
                        %R 10.5120/2543-3483
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Developing a Bangla OCR requires bunch of algorithm and methods. There were many effort went on for developing a Bangla OCR. But all of them failed to provide an error free Bangla OCR. Each of them has some lacking. We discussed about the problem scope of currently existing Bangla OCR’s. In this paper, we present the basic steps required for developing a Bangla OCR and a complete workflow for development of a Bangla OCR with mentioning all the possible algorithms required.

References
  • Md. AbulHasnat, S M MurtozaHabib and MumitKhan."A high performance domain specific OCR for Bangla script", Int. Joint Conf. on Computer, Information, and Systems Sciences, and Engineering (CISSE), 2007.
  • Open_Source_Bangla_OCR:http://sourceforge.net/project/showfiles.php?group_id=158301&package_id=215908.
  • A. B. M. Abdullah and A. Rahman, “A Different Approach in Spell Checking for South Asian Languages”, Proc. of 2nd ICITA, 2004.
  • A. B. M. Abdullah and A. Rahman, “Spell Checking for Bangla Languages: An Implementation Perspective”, Proc. of 6th ICCIT, 2003, pp. 856-860.
  • U. Garain and B. B. Chaudhuri, “Segmentation of Touching Characters in Printed Devnagari and Bangla Scripts using Fuzzy Multifactorial Analysis”, IEEE Transactions on Systems, Man and Cybernetics, vol.32, pp. 449-459, Nov. 2002.
  • Minhaz Fahim Zibran, Arif Tanvir, Rajiullah Shammi and Ms. Abdus Sattar, Computer Representation of Bangla Characters And Sorting of Bangla Words, Proc. ICCIT’ 2002 , 27-28 December, East West University, Dhaka, Bangladesh.
  • ArifBillah Al-Mahmud Abdullah and MumitKhan,“A Survey on Script Segmentation for Bangla OCR” Dept. of CSE, BRAC University, Dhaka, Bangladesh
  • Md. MahbubAlam and Dr. M. AbulKashem, “A Complete Bangla OCR System for Printed Chracters” JCIT-100707.pdf
  • J. He, Q. D. M. Do*, A. C. Downton and J. H. Kim, ”A Comparison of Binarization Methods for Historical Archive Documents”.
  • Tushar Patnaik, Shalu Gupta, Deepak Arya, ”Comparison of Binarization Algorithm in Indian Language OCR”.
  • Ahmed Shah Mashiyat Ahmed Shah MehadiKamrulHasanTalukder“Bangla off-line Handwritten Character Recognition Using Superimposed Matrices”, 7th ICCT_2004_112.pdf
  • Sho Miura, Hiroyuki Tsuji, Tomoaki Kimura, Shinji Tokumasu, “MIXED NOISE REMOVAL IN DIGITAL IMAGES USING ENHANCED TV FILTERS”, IEEE- Automation Congress, 2008. WAC 2008. World. Sept. 28 2008-Oct. 2 2008
  • Marie Nikaido, Naoyuki Tamaru,“Noise reduction for gray image using a Kalman filter” SICE 2003 Annual Conference Issue Date : 4-6 Aug. 2003, Volume : 2,On page(s): 1748
  • M.HassanShirali-Shahreza, SajadShirali-Shahreza , “Removing Noises Similar to Dots from Persian Scanned Documents” Computing, Communication, Control, and Management, 2008. CCCM '08. ISECS International Colloquium on Issue Date: 3-4 Aug. 2008 On page(s): 313 – 317
  • Tinku Acharya and Ajoy K. Ray (2005). “Image Processing Principles and Applications”, John Wiley & Sons, Inc., Hoboken, New Jersey
  • J. U. Mahmud, M. F. Rahman and C. M. Rahman (2003). “A Complete OCR System for Continuous Bengali Characters”, IEEE,PP. 1372-1376
  • B.B. Chaudhuri and U. Pal, "Skew Angle Detection Of Digitized Indian Script Documents", IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19, pp.182-186, 1997.
  • S. M. MurtozaHabib, Nawsher Ahmed Noor and Mumit Khan, Skew Angle Detection of Bangla script using Radon Transform, Proc. of 9th ICCIT, 2006.
  • Description_Of_RLSA_Algorithm:http://crblpocr.blogspot.com/2007/06/run-length-smoothing-algorithm-rlsa.html
  • Thomas M. Breuel, DFKI and U. Kaiserslautern Kaiserslautern, Germany “The OCRopus Open Source OCR System”.
  • A. Ray Chaudhuri, A.K.Mandal, B.B. Chaudhuri “Page Layout Analyzer for Multilingual Indian Documents” Proceedings of the Language Engineering Conference (LEC’02), IEEE.
  • Swapnil Khedekar, Vemulapati Ramanaprasad, Srirangaraj Setlur, Venugopal Govindaraju “Text-Image Separation in Devangari Documents” Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003), IEEE
  • Nasreen Akter, Saima Hossain, Md. Tajul Islam & Hasan Sarwar (2008). An Algorithm For Segmenting Modifies From Bangla Text, ICCIT, IEEE, Khulna,Bangladesh, PP.177-182
  • B.B. Chaudhuri & U. Pal (1998). Complete Printed Bangla OCR System, Elsevier Science Ltd. Pattern Recognition, Vol(31): 531-549
  • Md. Al Mehedi Hasan, Md. Abdul Alim, Md. Wahedul Islam & M. Ganger Ali (2005). Bangla Text Extraction and Recognition from Textual Image, NCCPB, Bangladesh, PP.171-176
  • Abu Sayeed Md. Sohail, Md. Robiul Islam, Boshir Ahmed & M A Mottalib (2005). Improvement in Existing Offline Bangla Character Recognitions Techniques Introducing Substainability to Rotation and Noise, NCCPB, Bangladesh, pp. 163-170
  • Angshul Majumdar & Rabab K. Ward (2009). Nearest Subspace Classifier: Application To Character Recognition
  • Subhadip Basu, Nibaran Das, Ram Sarkar, MahantapasKundu, Mita Nasipuri & DipakKumarBasu (2005). Handwritten 'Bangla ' Alphabet Recognition Using an MLP Based Classifier, NCCPB, Bangladesh, PP. 285-291
  • Adnan Mohammad Shoeb Shatil and Mumit Khan (2007). Computer Science and Engineering, BRAC University, Dhaka,Bangladesh “Minimally Segmenting High Performance Bangla OpticalCharacter Recognition Using Kohonen Network”
  • Md. AbulHasnat, S. M. Murtoza Habib, Mumit Khan, “Segmentation free Bangla OCR using HMM: Training and Recognition”
  • Ray Smith, "An Overview of the Tesseract OCR Engine",Proc. of ICDAR 2007, Volume 2, Page(s):629 - 633, 2007.
  • Tesseract-OCR: http://code.google.com/p/tesseract-ocr/
  • Md. AbulHasnat, Muttakinur Rahman Chowdhury and Mumit Khan, "Integrating Bangla script recognition support in Tesseract OCR", Proc. of the Conference on Language and Technology 2009 (CLT09), Lahore, Pakistan, 2009.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

OCR Bangla OCR Bangla Font Matra Preprocessing Binarization Classification Segmentation Page Layout analysis Tesseract

Powered by PhDFocusTM