Monday, 25 March 2013

           Android - Image 2 Text Conversion

Image to Text :

         Extracts the characters from a selected Image.

Android App  for Image to Text Convertion:

       Tesseract is  probably the most accurate open source OCR(Optical Character Recognition) engine available. It can read wide variety of image formats and convert them to text in over 60 languages.
       TessBase is the library for android platform, below will explain how to download , build and use the TessBase library in your android app for Image to Text Convertion

1. Download tesseract library for android Download as .zip for

2. Software requirement
    - Eclipse
    - Java JDK
    - Android SDK
    - Android NDK
    - Cygwin ( for windows users)
    - Apache-ant
             Install all the above mentioned softwares.

3. Download apache-ant from choose .zip for windows.

4.  For windows user, make sure you already installed cygwin ( you can download it  and install it from make sure during the cygwin installation, install also these source and library gcc-core, gcc-g++, make, swig)

5.  Unzip the apache and set the environment variable (mine is C:\apache-ant-1.8.3\bin)

6. Run cygwin (for windows user only,for linux user,run terminal) <project-directory>/tess-two
     b.export TESSERACT_PATH=${PWD}/external/tesseract-3.01
     c.export  LEPTONICA_PATH=${PWD}/external/leptonica-1.68
     d.export LIBJPEG_PATH=${PWD}/external/libjpeg
     e.ndk-build(for windows user, /cygdrive/<ndk-directory>/ndk-build)
     f. android update project --path . (for windows user, sometime cygwin cannot execute this command,     so
        use command prompt to execute this command).
        Note: The “.” after --path must be included in the command.
     g. ant release ( sometimes you will get error like java tools.jar not found, set environment variable
         JAVA_HOME to the jdk folder, mine is C:\Program Files\Java\jdk1.7.0)

7. Run Eclipse. Right click on package explorer, import-> General -> Existing Project into Workspace
    ->Next ->Select Root Directory -> Browse the tess-two folder location -> Finish.

8. Now , you need to compile and build this library, so that can be included in your android app.
     cd <project-directory>/tess-two
    android update project --path .
     ant release

make sure that , no errors while compiling,

9.  Once the library is build, import the project as a library in Eclipse. File -> Import -> Existing Projects into workspace -> tess-two directory. Right click the project, Android Tools -> Fix Project Properties. Right click -> Properties -> Android -> Check Is Library.

10.  Find the below function which takes image path as input and returns the extracted text,

    String image2Text(String imagePath)

= Environment.getExternalStorageDirectory().toString() + "/Android/data/" + appContext.getPackageName() +  "/";
    File tessdata = new
   if (!tessdata.exists() || !tessdata.isDirectory())
       throw new IllegalArgumentException("Data path must contain subfolder tessdata!");     }
     Bitmap image= BitmapFactory.decodeFile(imagePath);      TessBaseAPI baseApi = new TessBaseAPI();      baseApi.init(dataPath
, "eng");
      String recognizedText = baseApi.getUTF8Text();

      return recognizedText;

11. What is dataPath?
       you need to copy the training data for a particular language and copy the files inside a folder named "tessdata"  and datapath should point to parent folder of tessdata i.e datapath should contain a sub folder names "tessdata"

12.  How to test the above code snippet in emulator?

     Creat a folder with name equal to your workspace name in the path /mnt/sdcard/Android/data/

Then , copy the tessdata folder inside the workspace folder,

In Eclipse, open the DDMS perspective and Add the folder named- 'Your Workspace Name ' inside the path "/mnt/sdcard/Android/data/"  , create a folder named "tessdata" inside your workspace folder, copy all the training data inside the "tessdata" using button 'Push file to device" button



  1. In this project, the only things to do in this project is just only importing libraries ?. is there any algorithm should be apply in this project ?

  2. if you like tesseract ocr, you may like this free online ocr tool using tesseract ocr 3.02