Research Article

An Assistive Reading System for Visually Impaired using OCR and TTS

by  Akshay Sharma, Abhishek Srivastava, Adhar Vashishth
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 95 - Issue 2
Published: June 2014
Authors: Akshay Sharma, Abhishek Srivastava, Adhar Vashishth
10.5120/16566-6231
PDF

Akshay Sharma, Abhishek Srivastava, Adhar Vashishth . An Assistive Reading System for Visually Impaired using OCR and TTS. International Journal of Computer Applications. 95, 2 (June 2014), 13-18. DOI=10.5120/16566-6231

                        @article{ 10.5120/16566-6231,
                        author  = { Akshay Sharma,Abhishek Srivastava,Adhar Vashishth },
                        title   = { An Assistive Reading System for Visually Impaired using OCR and TTS },
                        journal = { International Journal of Computer Applications },
                        year    = { 2014 },
                        volume  = { 95 },
                        number  = { 2 },
                        pages   = { 13-18 },
                        doi     = { 10.5120/16566-6231 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2014
                        %A Akshay Sharma
                        %A Abhishek Srivastava
                        %A Adhar Vashishth
                        %T An Assistive Reading System for Visually Impaired using OCR and TTS%T 
                        %J International Journal of Computer Applications
                        %V 95
                        %N 2
                        %P 13-18
                        %R 10.5120/16566-6231
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Reading machines are mechatronic devices which use optical character recognition and text-to-speech technology in order to output synthetic voice from printed text. In this paper an assistive system has been proposed for visually impaired or blind persons. It reads textual information on papers and produces corresponding voice using OCR (Optical Character Recognition)and TTS (Text-to-speech) system. To localize text regions in images connected component labeling approach using histogram analysis is done on binarized image. TTS system using Concatenative synthesis based on SDK (Software Development Kit) platform is used. This system is operated via a voice-based user interface and also has a user friendly GUI (graphical user interface) to scan the text and to control various speech parameters. Speech signal produced can be saved and reproduced for later use.

References
  • M. Lyu, J. Song, M. Cai, A comprehensive method for multilingual video text detection, localization, and extraction, IEEE Transactions on Circuits and Systems for Video Technology 15 (2) (2005) 243–255.
  • J. Lim, J. Park, G. G. Medioni, Text segmentation in color images using tensor voting, Image and Vision Computing 25 (5) (2007) 671–685
  • K. I. Kim, K. Jung, J. H. Kim, Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (12) (2003) 1631–1639
  • S. Kumar, R. Gupta, N. Khanna, S. Chaudhury, S. D. Joshi, Text extraction and document image segmentation using matched wavelets and MFR model, IEEE Transactions on Image Processing 16 (8) (2007) 2117–2128.
  • D. Chen, O. Jean-Marc, B. Herve, Text detection and recognition in images and video frames, Pattern Recognition 37 (3) (2004) 595–608.
  • C. Jung, Q. Liu, J. Kim, Accurate text localization in images based on SVM output scores, Image and Vision Computing 27 (2009) 1295–1301.
  • Q. X. Ye, Q. M. Huang, W. Gao, D. B. Zhao, Fast and robust text detection in images and video frames, Image and Vision Computing 23 (6) (2005) 565–576.
  • M. Anthimopoulos, B. Gatos and I. Pratikakis, A two-stage scheme for text detection in video images, Image and Vision Computing, (2010)
  • H. Y. Shen, J. Coughlan, V. Ivanchenko, Figure-ground segmentation using factor graphs, Image and Vision Computing 27 (7) (2009) 854–863.
  • C. Strouthopoulos, N. Papamarkos, Text identification for document image analysis using a neural network, Image and Vision Computing 16 (12–13) (1998) 879–896
  • Tokuda et al," Speech Synthesis Based on Hidden Markov Models",Proceedings of the IEEE | Vol. 101, No. 5, May 2013
  • A. G. Ramakrishnan, Lakshmish N Kaushik, LaxmiNarayana. M, "Natural Language Processing for Tamil TTS", Proc. 3rd Language and Technology Conference, Poznan, Poland, October 5-7, 2007
  • Chen, G. L. , Yue, D. J. , Zu, Y. Q. , Yu, Z. L. , "An embedded English synthesis approach based on speech concatenation and smoothing", ISCSLP2004, pp. 157-160, Hong Kong, Dec. 2004
  • T. Dutoit, "An Introduction to Text-to-Speech Synthesis". Dordrecht/Boston/London: Kluwer Academic Publishers, 1997.
  • T. Styger and E. Keller, Fundamentals ofSpeech Synthesis and Speech Recognition: Basic Concepts, State of the Art, and Future Challenges in Formant synthesis, In Keller E. (ed. ), 109-128, Chichester: John Wiley, 1994. , 4,5
  • 13. D. H. Klatt, ''Software for a cascade/parallel formant synthesizer,'' J. Acoust. Soc. Am. , vol. 67, no. 3,971–995, 1980.
  • J. Allen, M. S. Hunnicutt, and D. Klatt, From Text to Speech, The MITalk System, Cambridge: CambridgeUniversity Press, 1987
  • Moulines, E. , Charpentier, F. "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones", Speech Communication, Vol. 9, pp. 453-468, 1990
  • Sproat, R. , Hirschberg, J. , Yarowsky, D. , "A corpus-based synthesizer", ICSLP1992, pp. 563-566, Alberta, Canada, Oct. 1992
  • Van Santen J. , Sproat, R. , Olive, J. , Hirshberg, J. , editors, Progress in Speech Synthesis, Springer Verlag, New York, 1995
  • Gonzalez, R. C. andWoods, R. E. 1992. "Digital Image Processing". Addison-Wesley.
  • Wang Y. , Phillips I. T. , and Haralick, R. M. 2006. Document zone content classificationand its performance evaluation. Pattern Recognition, 39: 57-73.
  • Shih, F. Y. and Chen, S. S. 1996. Adaptivedocument block segmentation andclassification. IEEE Transfusion. SystemMan and Cybernetics-PART B: Cybernetics,26, 5: 797-802.
  • IngmundBjørkan,Speech Generation and Modification in Concatenative Speech Synthesis Ph D Thesis,Norwegian University of Science and Technology . Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Electronics and Telecommunications 2010
  • Sproat, R. and Oliver, J. "An Approach to Text-to-Speech Synthesis". Chapter 17 in book "Speech Coding and Synthesis", Elsevier, 1995
  • S. Nakajima and H. Hamada, "Automatic generation of Synthesis Units based on context oriented clustering", Proc. ICASSP 1988, pp. 659-662, (New York, USA), 1988].
  • R. E. Donovan and E. M. Eide, ''The IBM trainable speech synthesis system,'' in Proc. Int. Conf. Spoken Lang. Process. , 1998, pp. 1703–1706.
  • B. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou, and A. Syrdal, ''The AT&T Next-Gen TTS system,'' in Proc. Joint ASA/EAA/DAEA Meeting, 1999,pp. 15–19.
  • G. Coorman, J. Fackrell, P. Rutten, and B. Coile, ''Segment selection in the L&H realspeak laboratory TTS system,'' in Proc. Int. Conf. Spoken Lang. Process. , 2000,pp. 395–398. ]
  • http://msdn. microsoft. com/en-us/library/ms720151(v=vs. 85). aspx.
  • Zenget a," Speech dynamic range for cochlear implants". J. Acoust. Soc. Am. , Vol. 111, No. 1, Pt. 1, Jan. 2002
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Text Information Extraction(TIE) Optical Character Recognition (OCR) Connected Component Labeling Text-to-speech (TTS) Concatenative synthesis Graphical User Interface(GUI)

Powered by PhDFocusTM