An Assistive Reading System for Visually Impaired using OCR and TTS

Akshay Sharma; Abhishek Srivastava; Adhar Vashishth

Research Article

An Assistive Reading System for Visually Impaired using OCR and TTS

by Akshay Sharma, Abhishek Srivastava, Adhar Vashishth

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 95 - Issue 2

Published: June 2014

Authors: Akshay Sharma, Abhishek Srivastava, Adhar Vashishth

10.5120/16566-6231

PDF

Akshay Sharma, Abhishek Srivastava, Adhar Vashishth . An Assistive Reading System for Visually Impaired using OCR and TTS. International Journal of Computer Applications. 95, 2 (June 2014), 13-18. DOI=10.5120/16566-6231

                        @article{ 10.5120/16566-6231,
                        author  = { Akshay Sharma,Abhishek Srivastava,Adhar Vashishth },
                        title   = { An Assistive Reading System for Visually Impaired using OCR and TTS },
                        journal = { International Journal of Computer Applications },
                        year    = { 2014 },
                        volume  = { 95 },
                        number  = { 2 },
                        pages   = { 13-18 },
                        doi     = { 10.5120/16566-6231 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2014
                        %A Akshay Sharma
                        %A Abhishek Srivastava
                        %A Adhar Vashishth
                        %T An Assistive Reading System for Visually Impaired using OCR and TTS%T 
                        %J International Journal of Computer Applications
                        %V 95
                        %N 2
                        %P 13-18
                        %R 10.5120/16566-6231
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Reading machines are mechatronic devices which use optical character recognition and text-to-speech technology in order to output synthetic voice from printed text. In this paper an assistive system has been proposed for visually impaired or blind persons. It reads textual information on papers and produces corresponding voice using OCR (Optical Character Recognition)and TTS (Text-to-speech) system. To localize text regions in images connected component labeling approach using histogram analysis is done on binarized image. TTS system using Concatenative synthesis based on SDK (Software Development Kit) platform is used. This system is operated via a voice-based user interface and also has a user friendly GUI (graphical user interface) to scan the text and to control various speech parameters. Speech signal produced can be saved and reproduced for later use.

References

M. Lyu, J. Song, M. Cai, A comprehensive method for multilingual video text detection, localization, and extraction, IEEE Transactions on Circuits and Systems for Video Technology 15 (2) (2005) 243–255.
J. Lim, J. Park, G. G. Medioni, Text segmentation in color images using tensor voting, Image and Vision Computing 25 (5) (2007) 671–685
K. I. Kim, K. Jung, J. H. Kim, Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (12) (2003) 1631–1639
S. Kumar, R. Gupta, N. Khanna, S. Chaudhury, S. D. Joshi, Text extraction and document image segmentation using matched wavelets and MFR model, IEEE Transactions on Image Processing 16 (8) (2007) 2117–2128.
D. Chen, O. Jean-Marc, B. Herve, Text detection and recognition in images and video frames, Pattern Recognition 37 (3) (2004) 595–608.
C. Jung, Q. Liu, J. Kim, Accurate text localization in images based on SVM output scores, Image and Vision Computing 27 (2009) 1295–1301.
Q. X. Ye, Q. M. Huang, W. Gao, D. B. Zhao, Fast and robust text detection in images and video frames, Image and Vision Computing 23 (6) (2005) 565–576.
M. Anthimopoulos, B. Gatos and I. Pratikakis, A two-stage scheme for text detection in video images, Image and Vision Computing, (2010)
H. Y. Shen, J. Coughlan, V. Ivanchenko, Figure-ground segmentation using factor graphs, Image and Vision Computing 27 (7) (2009) 854–863.
C. Strouthopoulos, N. Papamarkos, Text identification for document image analysis using a neural network, Image and Vision Computing 16 (12–13) (1998) 879–896
Tokuda et al," Speech Synthesis Based on Hidden Markov Models",Proceedings of the IEEE | Vol. 101, No. 5, May 2013
A. G. Ramakrishnan, Lakshmish N Kaushik, LaxmiNarayana. M, "Natural Language Processing for Tamil TTS", Proc. 3rd Language and Technology Conference, Poznan, Poland, October 5-7, 2007
Chen, G. L. , Yue, D. J. , Zu, Y. Q. , Yu, Z. L. , "An embedded English synthesis approach based on speech concatenation and smoothing", ISCSLP2004, pp. 157-160, Hong Kong, Dec. 2004
T. Dutoit, "An Introduction to Text-to-Speech Synthesis". Dordrecht/Boston/London: Kluwer Academic Publishers, 1997.
T. Styger and E. Keller, Fundamentals ofSpeech Synthesis and Speech Recognition: Basic Concepts, State of the Art, and Future Challenges in Formant synthesis, In Keller E. (ed. ), 109-128, Chichester: John Wiley, 1994. , 4,5
13. D. H. Klatt, ''Software for a cascade/parallel formant synthesizer,'' J. Acoust. Soc. Am. , vol. 67, no. 3,971–995, 1980.
J. Allen, M. S. Hunnicutt, and D. Klatt, From Text to Speech, The MITalk System, Cambridge: CambridgeUniversity Press, 1987
Moulines, E. , Charpentier, F. "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones", Speech Communication, Vol. 9, pp. 453-468, 1990
Sproat, R. , Hirschberg, J. , Yarowsky, D. , "A corpus-based synthesizer", ICSLP1992, pp. 563-566, Alberta, Canada, Oct. 1992
Van Santen J. , Sproat, R. , Olive, J. , Hirshberg, J. , editors, Progress in Speech Synthesis, Springer Verlag, New York, 1995
Gonzalez, R. C. andWoods, R. E. 1992. "Digital Image Processing". Addison-Wesley.
Wang Y. , Phillips I. T. , and Haralick, R. M. 2006. Document zone content classificationand its performance evaluation. Pattern Recognition, 39: 57-73.
Shih, F. Y. and Chen, S. S. 1996. Adaptivedocument block segmentation andclassification. IEEE Transfusion. SystemMan and Cybernetics-PART B: Cybernetics,26, 5: 797-802.
IngmundBjørkan,Speech Generation and Modification in Concatenative Speech Synthesis Ph D Thesis,Norwegian University of Science and Technology . Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Electronics and Telecommunications 2010
Sproat, R. and Oliver, J. "An Approach to Text-to-Speech Synthesis". Chapter 17 in book "Speech Coding and Synthesis", Elsevier, 1995
S. Nakajima and H. Hamada, "Automatic generation of Synthesis Units based on context oriented clustering", Proc. ICASSP 1988, pp. 659-662, (New York, USA), 1988].
R. E. Donovan and E. M. Eide, ''The IBM trainable speech synthesis system,'' in Proc. Int. Conf. Spoken Lang. Process. , 1998, pp. 1703–1706.
B. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou, and A. Syrdal, ''The AT&T Next-Gen TTS system,'' in Proc. Joint ASA/EAA/DAEA Meeting, 1999,pp. 15–19.
G. Coorman, J. Fackrell, P. Rutten, and B. Coile, ''Segment selection in the L&H realspeak laboratory TTS system,'' in Proc. Int. Conf. Spoken Lang. Process. , 2000,pp. 395–398. ]
http://msdn. microsoft. com/en-us/library/ms720151(v=vs. 85). aspx.
Zenget a," Speech dynamic range for cochlear implants". J. Acoust. Soc. Am. , Vol. 111, No. 1, Pt. 1, Jan. 2002

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Text Information Extraction(TIE) Optical Character Recognition (OCR) Connected Component Labeling Text-to-speech (TTS) Concatenative synthesis Graphical User Interface(GUI)