Research Article

Head Mounted Device for Real World Text to Speech Conversion

by  Nikhil Varghese, Gaurav Tripathi
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 155 - Issue 5
Published: Dec 2016
Authors: Nikhil Varghese, Gaurav Tripathi
10.5120/ijca2016912309
PDF

Nikhil Varghese, Gaurav Tripathi . Head Mounted Device for Real World Text to Speech Conversion. International Journal of Computer Applications. 155, 5 (Dec 2016), 16-20. DOI=10.5120/ijca2016912309

                        @article{ 10.5120/ijca2016912309,
                        author  = { Nikhil Varghese,Gaurav Tripathi },
                        title   = { Head Mounted Device for Real World Text to Speech Conversion },
                        journal = { International Journal of Computer Applications },
                        year    = { 2016 },
                        volume  = { 155 },
                        number  = { 5 },
                        pages   = { 16-20 },
                        doi     = { 10.5120/ijca2016912309 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2016
                        %A Nikhil Varghese
                        %A Gaurav Tripathi
                        %T Head Mounted Device for Real World Text to Speech Conversion%T 
                        %J International Journal of Computer Applications
                        %V 155
                        %N 5
                        %P 16-20
                        %R 10.5120/ijca2016912309
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

There is no low-cost aid for visually impaired people despite several advances in technology. This paper presents a mobile head-mounted device to detect and convert text in natural scenes to speech. The major components of the device are a Raspberry Pi, a high definition webcam, earphones and a portable power bank. The Raspberry Pi is connected to the webcam which captures the image. A text detection algorithm using Class Specific Extremal Regions (CSERs) is implemented to detect the text in complex natural scenes. The segmented image is passed to the Tesseract OCR engine for text detection. The identified text is converted to audio using the espeak Python module in the Raspberry Pi. Thus, a visually impaired person can use this device to hear all the text in his surroundings like the name of a shop, public notices, billboards, road directions, etc.

References
  • (Aug. 2014). WHO | Visual impairment and blindness. [Online] Available: http://www.who.int/mediacentre/factsheets/fs282/en/
  • R. Kurzweil, The age of spiritual machines: when computers exceed human intelligence. Viking Press, 1998
  • T. Hedgpeth, J. A. Black, and S. Panchanathan, “A demonstration of the icare portable reader,” in ACM SIGACCESS, 2006, pp. 279–280.
  • H. Aoki, B. Schiele, and A. Pentland, “Realtime personal positioning system for a wearable computer,” in ISWC, 1999, pp. 37–43.
  • J. Chmiel, O. Stankiewicz, W. Switala, M. Tluczek, and J. Jelonek, “Read IT project report: A portable text reading system for the blind people,” 2005
  • About – Google Translate. [Online] Available: http://translate.google.co.in/about/intl/en_ALL/
  • (2016). KNFB Reader. [Online] Available: http://www.knfbreader.com/
  • X. Shi and Y. Xu, “A wearable translation robot,” in ICRA, 2005.
  • Carlos Merino-Gracia, Karel Lenc and Majid Mirmehdi, “A Headmounted Device for Recognizing Text in Natural Scenes”, Visual Information Laboratory, University of Bristol, UK
  • Help Videos - Raspberry Pi. [Online] Available: https://www.raspberrypi.org/help/what-is-a-raspberry-pi/
  • (2016). Logitech C920 HD Pro Webcam for Windows, Mac, and Chrome OS. [Online] Available: https://secure.logitech.com/en-in/product/hd-pro-webcam-c920
  • (Nov, 2014). Class-specific Extremal Regions for Scene Text Detection. [Online] Available: http://docs.opencv.org/3.0-beta/modules/text/doc/erfilter.html
  • Chen, Huizhong, et al. “Robust Text Detection in Natural Images with Edge-Enhanced Maximally Stable Extremal Regions.” Image Processing
  • J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions.” In BMVC, 2002 (ICIP), 2011 18th IEEE International Conference on. IEEE, 2011 Document Analysis and Recognition, 2013
  • Gomez L. and Karatzas D., "Multi-script Text Extraction from Natural Scenes", 12th International Conference on Robust Text Detection in Natural Scene Images.
  • GitHub Tessaract OCR. [Online] Available: https://github.com/tesseract-ocr/tesseract
  • Thierry DutoitTTS research team, TCTS Lab:An Introduction to text-to-speech synthesis - TCTS Lab
  • Neumann L., Matas J.: Real-Time Scene Text Localization and Recognition, CVPR 2012 (Providence, Rhode Island, USA)
  • (2016).GitHub TessData. [Online] Available: https://github.com/tesseract-ocr/tessdata
  • (Aug, 2016). Norvig, P. How to Write a Spelling Corrector. [Online] Available: http://norvig.com/spell-correct.html
  • eSpeak text to speech. [Online] Available: http://espeak.sourceforge.net/
  • (Oct, 2012). Yao, C. MSRA Text Detection 500 Database. [Online] Available: http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500)
  • Andrej Karpathy, Li Fei-Fei "Deep Visual-Semantic Alignments for Generating Image Descriptions", Department of Computer Science, Stanford University, 2014
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Class-Specific Extremal Region Head-mounted device MSER(Maximally Stable Extremal Regions) Raspberry Pi Tesseract OCR Probabilistic Hough Lines Transformation

Powered by PhDFocusTM