Intelligent Photo OCR that reads better than you (Or not)
We started our Machine Learning and Deep Learning journey with Andrew Ng’s course on Coursera. The course ends with describing how ML can be used for Photo OCR i.e. reading text from camera images – just like reading the word “Marriott” from the image below:
OCR is often thought to be a solved problem, and it is certainly for cases where documents scanned using flatbed scanners are to be OCRed. But Photo OCR ie reading text in photographs clicked using a camera - is pretty hard; hard enough to be used as CAPTCHA for distinguishing Humans from Bots. Several companies still have to hire an army of BPO employees to type and interpret text from images for bill processing, read number plates from camera footage, process insurance documents etc.
This looked an intriguing problem, so we decided to build our own Deep learning Photo OCR. After banging our heads against the problem for a few sleepless months, we solved it to a large extent and are using it for automatically reading and interpreting photographed images of restaurant receipts.
The solution pipeline has three modules
1. Segmentation– This basically means identifying text in an image, making a bounding box around the text and cropping segments for the next step. For this, we trained Network consisting of a few CNN layers written on Caffe framework. Here is a sample result.
2. Recognition (OCR) – We modified LSTM (Long Short Term Memory) based open source OCR tool, for recognizing text in the cropped segments.
3. Interpretation – The requirement often is also to correct and interpret recognized text as well. So, we built a spell check to correct and SVM classifier to interpret the information as either “Date”, “Time”, “Total”, “Subtotal” etc. so that the solution can be used directly by any business.
The entire algorithm is generic and can be used for any OCR application it is trained for. The pipeline at the end just outputs required business information for a given photograph (containing text).
And the best part is - As the entire pipeline is Machine learning based, its accuracy automatically keeps on increasing as it gets more and more data to learn from.
Deep learning Algorithms continuously surprise us with their results, here is an example where it manages to correctly identify text areas even when the receipt is badly squashed, blurred and barely readable.
The pipeline is under active development. We are in the process of replacing the three separate machine learning algorithms with one single network (Multi-layer CNN+LSTM) to make it more efficient and elegant.
The solution is already being used to save man hours and increase productivity. So, please do connect if you are interested in any Photo OCR Application or Deep Learning Technology.
Acknowledgment: We sincerely thank open source community for freely sharing knowledge making the above possible. In the same spirit, we shall soon be sharing alpha version of the above pipeline on github.
All in on crypto
7yI need help with datasets to use for similar implementation, i am thinking of developing ocr system for my final year project, any help would be appreciated.
Student at Hiedelberg University
8yLooks cool. Did you use Ocropus or Tesseract for text recognition?
¿
9yInspiring