Singularium digitizes large numbers of tabular invoices to generate timely market insights in a very dynamic FMCG market. Covis is one of the cornerstones in this process. A proprietary image processing engine delivering high quality outputs – it can automate the digitization of images of invoices, receipts, catalogues, scorecards, menus & surveys among others.
* Currently, Covis is fine-tuned for printed/handwritten invoices and is available for testing in beta.
Optical character readers (OCRs) are widely deployed as a form of information entry from printed paper data records.
The latest OCRs are cutting edge Neural Nets that recognize characters and demonstrate high levels of accuracy & consistency even as they continue to evolve.
Google’s Vision, Microsoft’s Azure, Amazon’s Textract are some of the best in the business right now.Covis:
Covis is a hybrid engine that can extract tabular data present in images. A key feature is the engine’s ability to determine table structure without explicit lines marking rows & columns. A Neural Net combined with proprietary HATS (Heuristic Algorithm for Table Structures) resolves the structure while a combination of commercial & in-house trained OCRs reads the text.
The pre-processing of the image coupled with intelligent resolution of differential outputs from the combination OCR helps increase accuracy significantly. The engine demonstrates a 10% – 15% enhancement in reading accuracy while reducing post-processing effort in structuring the data. The video below highlights the mechanics behind Covis.Watch – A Pepsico invoice digitized to 98% read accuracy v/s a maximum of 85% from commercial OCRs
1. OCRs evaluated for FMCG invoice benchmarking
- Google Vision
- Abbyy FineReader
- Microsoft Azure
- Amazon Textract
Accuracy measured using Levenshtein edit distance
2. Table Structure Determination :
A 50 Layer Convolutional Neural Net with ResNET Architecture was trained specially for FMCG invoices. The Heuristic algorithm looks for generic patterns in an image to identify possible rows & columns. The outputs from the 2 modules are combined to generate the structure.