Head Mounted Device for Real World Text to Speech Conversion

Nikhil Varghese; Gaurav Tripathi

Research Article

Head Mounted Device for Real World Text to Speech Conversion

by Nikhil Varghese, Gaurav Tripathi

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 155 - Issue 5

Published: Dec 2016

Authors: Nikhil Varghese, Gaurav Tripathi

10.5120/ijca2016912309

PDF

Nikhil Varghese, Gaurav Tripathi . Head Mounted Device for Real World Text to Speech Conversion. International Journal of Computer Applications. 155, 5 (Dec 2016), 16-20. DOI=10.5120/ijca2016912309

                        @article{ 10.5120/ijca2016912309,
                        author  = { Nikhil Varghese,Gaurav Tripathi },
                        title   = { Head Mounted Device for Real World Text to Speech Conversion },
                        journal = { International Journal of Computer Applications },
                        year    = { 2016 },
                        volume  = { 155 },
                        number  = { 5 },
                        pages   = { 16-20 },
                        doi     = { 10.5120/ijca2016912309 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2016
                        %A Nikhil Varghese
                        %A Gaurav Tripathi
                        %T Head Mounted Device for Real World Text to Speech Conversion%T 
                        %J International Journal of Computer Applications
                        %V 155
                        %N 5
                        %P 16-20
                        %R 10.5120/ijca2016912309
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

There is no low-cost aid for visually impaired people despite several advances in technology. This paper presents a mobile head-mounted device to detect and convert text in natural scenes to speech. The major components of the device are a Raspberry Pi, a high definition webcam, earphones and a portable power bank. The Raspberry Pi is connected to the webcam which captures the image. A text detection algorithm using Class Specific Extremal Regions (CSERs) is implemented to detect the text in complex natural scenes. The segmented image is passed to the Tesseract OCR engine for text detection. The identified text is converted to audio using the espeak Python module in the Raspberry Pi. Thus, a visually impaired person can use this device to hear all the text in his surroundings like the name of a shop, public notices, billboards, road directions, etc.

References

(Aug. 2014). WHO | Visual impairment and blindness. [Online] Available: http://www.who.int/mediacentre/factsheets/fs282/en/
R. Kurzweil, The age of spiritual machines: when computers exceed human intelligence. Viking Press, 1998
T. Hedgpeth, J. A. Black, and S. Panchanathan, “A demonstration of the icare portable reader,” in ACM SIGACCESS, 2006, pp. 279–280.
H. Aoki, B. Schiele, and A. Pentland, “Realtime personal positioning system for a wearable computer,” in ISWC, 1999, pp. 37–43.
J. Chmiel, O. Stankiewicz, W. Switala, M. Tluczek, and J. Jelonek, “Read IT project report: A portable text reading system for the blind people,” 2005
About – Google Translate. [Online] Available: http://translate.google.co.in/about/intl/en_ALL/
(2016). KNFB Reader. [Online] Available: http://www.knfbreader.com/
X. Shi and Y. Xu, “A wearable translation robot,” in ICRA, 2005.
Carlos Merino-Gracia, Karel Lenc and Majid Mirmehdi, “A Headmounted Device for Recognizing Text in Natural Scenes”, Visual Information Laboratory, University of Bristol, UK
Help Videos - Raspberry Pi. [Online] Available: https://www.raspberrypi.org/help/what-is-a-raspberry-pi/
(2016). Logitech C920 HD Pro Webcam for Windows, Mac, and Chrome OS. [Online] Available: https://secure.logitech.com/en-in/product/hd-pro-webcam-c920
(Nov, 2014). Class-specific Extremal Regions for Scene Text Detection. [Online] Available: http://docs.opencv.org/3.0-beta/modules/text/doc/erfilter.html
Chen, Huizhong, et al. “Robust Text Detection in Natural Images with Edge-Enhanced Maximally Stable Extremal Regions.” Image Processing
J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions.” In BMVC, 2002 (ICIP), 2011 18th IEEE International Conference on. IEEE, 2011 Document Analysis and Recognition, 2013
Gomez L. and Karatzas D., "Multi-script Text Extraction from Natural Scenes", 12th International Conference on Robust Text Detection in Natural Scene Images.
GitHub Tessaract OCR. [Online] Available: https://github.com/tesseract-ocr/tesseract
Thierry DutoitTTS research team, TCTS Lab:An Introduction to text-to-speech synthesis - TCTS Lab
Neumann L., Matas J.: Real-Time Scene Text Localization and Recognition, CVPR 2012 (Providence, Rhode Island, USA)
(2016).GitHub TessData. [Online] Available: https://github.com/tesseract-ocr/tessdata
(Aug, 2016). Norvig, P. How to Write a Spelling Corrector. [Online] Available: http://norvig.com/spell-correct.html
eSpeak text to speech. [Online] Available: http://espeak.sourceforge.net/
(Oct, 2012). Yao, C. MSRA Text Detection 500 Database. [Online] Available: http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500)
Andrej Karpathy, Li Fei-Fei "Deep Visual-Semantic Alignments for Generating Image Descriptions", Department of Computer Science, Stanford University, 2014

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Class-Specific Extremal Region Head-mounted device MSER(Maximally Stable Extremal Regions) Raspberry Pi Tesseract OCR Probabilistic Hough Lines Transformation