Read - O - Vision

A portable device which helps visually impaired to read newspapers and books.

Image Processing
Image Processing



The aim of this project was to create a portable reading device for the visually impaired. This was achieved by creating a simple yet effective algorithm that can extract paragraphs from an image of newspapers, books, research articles in real time. A paragraph extraction technique has been used which puts to advantage the discontinuities between paragraphs and columns which rise due to the printing norms of indentation and spacing. The project was done in a group of two under the mentorship of our professor. I was involved in both the hardware and software development of the product.


The flow of the device looks like something shown in the image. The image is first clicked from the device. The captured image is then sent to the Text Block Extractor(TBE), where the image is processed and paragraphs extracted. These paragraphs are then sent to an OCR (Tesseract in our case) to extract the text. After the text has been extracted, we make use of text to speech converter (flite in our case) to convert the text into speech. This speech is then heard by the user through his ear piece.

Overview of the device
Window one
Window two

Text Block Extraction is a two fold method where we analyse the image under consideration in two orientations both horizontal and vertical. The analysis of the image begins with thresholding the image and the converting the image into blocks of white and black background. Two kinds of windows are used for this technique as shown in the image. The windows are used to check for neighborhood white boxes where the center box is in consideration. The image obtained is then analysed for the neighbourhood of 7 by 7 windows with Ib, where Ib signifies black intensity and Iw signfies white intensity. If the neighbourhood window contains more than or equal to 50 percent of white pixels then the black box is converted into white.

After the window analysis done in the previous slide, the image is traversed vertically and BlackCount is found out. BlackCount gives the number of black windows in a column. Local Maxima of the dataset obtained from the image is calculated. This local maxima signifies the vertical paragraph breaks in an image

Plot of vertical separation

Sample Image

The images illustrate an example from the first image being captured by the user to the end paragraph extracted using the algorithm developed.

The algorithm is able to extract text from the input even if the text is accompanied by some non-textual part but major part of the image needs to be text. For this condition to be true, the optimal distance of the camera from the image was determined experimentally to be around 15-20 cm. The source and camera need to be stable so as to capture an image with maximum clarity. The background of text is required to be invariantly white. To determine the efficiency of the TBE algorithm, two parameters were calculated, PrecisionRate and Recall where Precision is TrueParagraphs by TotalRetrievedParagraphs and Recall is TrueParagraphs by TotalTrueParagraphs

Prototype Image




Table 3 : Efficiency for the three category of images
Image Source
No. of Images
Percentage Precision
Percentage Recall
Newspaper Image
Book Image
Article Image