Android - Image 2 Text Conversion
Image to Text :Extracts the characters from a selected Image.
Android App for Image to Text Convertion:Tesseract is probably the most accurate open source OCR(Optical Character Recognition) engine available. It can read wide variety of image formats and convert them to text in over 60 languages.
TessBase is the library for android platform, below will explain how to download , build and use the TessBase library in your android app for Image to Text Convertion
1. Download tesseract library for android https://github.com/rmtheis/tess-two. Download as .zip for
2. Software requirement
- Java JDK
- Android SDK
- Android NDK
- Cygwin ( for windows users)
Install all the above mentioned softwares.
3. Download apache-ant from http://ant.apache.org/bindownload.cgi choose .zip for windows.
4. For windows user, make sure you already installed cygwin ( you can download it and install it from http://www.cygwin.com/ make sure during the cygwin installation, install also these source and library gcc-core, gcc-g++, make, swig)
5. Unzip the apache and set the environment variable (mine is C:\apache-ant-1.8.3\bin)
6. Run cygwin (for windows user only,for linux user,run terminal)
e.ndk-build(for windows user, /cygdrive/<ndk-directory>/ndk-build)
f. android update project --path . (for windows user, sometime cygwin cannot execute this command, so
use command prompt to execute this command).
Note: The “.” after --path must be included in the command.
g. ant release ( sometimes you will get error like java tools.jar not found, set environment variable
JAVA_HOME to the jdk folder, mine is C:\Program Files\Java\jdk1.7.0)
7. Run Eclipse. Right click on package explorer, import-> General -> Existing Project into Workspace
android update project --path .
make sure that , no errors while compiling,
9. Once the library is build, import the project as a library in Eclipse. File -> Import -> Existing Projects into workspace -> tess-two directory. Right click the project, Android Tools -> Fix Project Properties. Right click -> Properties -> Android -> Check Is Library.
10. Find the below function which takes image path as input and returns the extracted text,
String image2Text(String imagePath)
dataPath= Environment.getExternalStorageDirectory().toString() + "/Android/data/" + appContext.getPackageName() + "/";
File tessdata = newFile(dataPath);
Bitmap image= BitmapFactory.decodeFile(imagePath); TessBaseAPI baseApi = new TessBaseAPI(); baseApi.init(dataPath
String recognizedText = baseApi.getUTF8Text();
11. What is dataPath?
you need to copy the training data for a particular language and copy the files inside a folder named "tessdata" and datapath should point to parent folder of tessdata i.e datapath should contain a sub folder names "tessdata"
12. How to test the above code snippet in emulator?
Creat a folder with name equal to your workspace name in the path /mnt/sdcard/Android/data/
Then , copy the tessdata folder inside the workspace folder,
In Eclipse, open the DDMS perspective and Add the folder named- 'Your Workspace Name ' inside the path "/mnt/sdcard/Android/data/" , create a folder named "tessdata" inside your workspace folder, copy all the training data inside the "tessdata" using button 'Push file to device" button