Recognizing text and digit from the image and extracting the value is always a tough task ever in the digital era. You need to build our own machine learning model to do this task. And it is a more time-consuming task if you don't know how to do.
In this tutorial, I will guide you how to extract text from the image using the pre-trained machine learning model(Tesseract OCR).
Example of Text Extraction.
Extracted value from the sample image.
Please note it contains some error.
Tesseract OCR is a pre-trained model. Tesseract will recognize and "read" the text embedded in images.
Tesseract has Unicode (UTF-8) support, and can recognize more than 100 languages "out of the box".
Tesseract supports various output formats: plain-text, hocr(html), pdf, tsv, invisible-text-only pdf.
Here I am going to explain how to use this mode using Python.
Install the above library using the pip command on your terminal.
pip install pillow pip install pytesseract pip install numpy pip install opencv-python
Install Tesseract on Mac
brew install tesseract
Execute the above code on your Mac terminal
Use the above link to learn about windows installation.
How to use the Tesseract?.
Import the needed library
import cv2 import numpy as np import pytesseract from PIL import Image from pytesseract import image_to_string
Declare the image folder name
src_path = "tes-img/"
Write a function to return the extracted values from the image.
Call the function and pass the image name and print the result.
print('--- Start recognize text from image ---') print(get_string(src_path + "cont.jpg") ) print("------ Done -------")
That's all. You are done. Please let me know if you find any difficulty to implement this.
My email id is email@example.com
Enjoy image text recognition.
Get Full source code using