Recognizing text and digit from the image and extracting the value is always a tough task ever in the digital era. You need to build our own machine learning model to do this task. And it is a more time-consuming task if you don't know how to do.

In this tutorial, I will guide you how to extract text from the image using the pre-trained machine learning model(Tesseract OCR).

Example of Text Extraction.
Source Image.

Extracted value from the sample image.

Please note it contains some error.

Tesseract OCR

https://github.com/tesseract-ocr/tesseract
Tesseract OCR is a pre-trained model. Tesseract will recognize and "read" the text embedded in images.

Tesseract has Unicode (UTF-8) support, and can recognize more than 100 languages "out of the box".

Tesseract supports various output formats: plain-text, hocr(html), pdf, tsv, invisible-text-only pdf.

Here I am going to explain how to use this mode using Python.

Required Libraries.

Pillow
pytesseract
numpy
Opencv

Install the above library using the pip command on your terminal.

pip install pillow
pip install pytesseract
pip install numpy
pip install opencv-python

Install Tesseract on Mac

brew install tesseract

Execute the above code on your Mac terminal

Windows Installation

https://github.com/tesseract-ocr/tesseract/wiki#windows

Use the above link to learn about windows installation.

How to use the Tesseract?.

Step1:

Import the needed library

import cv2
import numpy as np
import pytesseract
from PIL import Image
from pytesseract import image_to_string

Step2:

Declare the image folder name

src_path = "tes-img/"

Step3:

Write a function to return the extracted values from the image.

Step4:

Call the function and pass the image name and print the result.

print('--- Start recognize text from image ---')
print(get_string(src_path + "cont.jpg") )
print("------ Done -------")

That's all. You are done. Please let me know if you find any difficulty to implement this.

My email id is bharathirajatut@gmail.com

Enjoy image text recognition.

 

Get Full source code using

https://github.com/bharathirajatut/python-data-science/tree/master/handwritten-digit-recognition