We propose an innovative AI solution able to process, organize documents and extract relevant information in a structured way. This leads to a system that is accessible, efficient, quicker, and less error-prone, thus allowing businesses to focus their resources on core activities.
In a rapidly-evolving world, pen, paper, and physical documents still play an important role. Many businesses still rely on the manual processing of physical and/or digitized documents in order to classify them, verify compliance or extract relevant content. This is especially true for many financial institutions. Although a tedious, time-consuming, and error-prone process, manual processing is often the only choice for the lack of suitable alternatives. We propose an innovative solution, written in Python and deployed on Google Cloud Platform, that, building on many different AI technologies, aims at providing an automatic system able to process, organize documents and extract relevant information in a structured way. This leads to a system that is more accessible, efficient, quicker, and less error-prone, thus allowing businesses to focus their resources on core activities.
The two main problems tackled by our system are: 1) Identification of document type from image. 2) For a given document type, extracting a subset of salient fields.
To better express the complexity of the problem, it is important to mention that address the problem of scanned images that may contain more than one document (i.e., identity card and driver license) and the system will not be notified with the kind of documents contained in the image. As a matter of fact, this system face different challenges as it aims to execute tasks, such as document recognition, that are easy for a human but cannot be readily solved with a simple algorithm or a rule-based system. Many different AI and machine learning concepts, therefore, need to be used together to solve the many different problems faced: it is necessary to make use of many novel AI technologies, such as Natural Language Processing for document recognition, Computer Vision solutions such as Convolutional Neural Network for document fields detection, Optical Character Recognition for reading text from the document. More specifically, the technological topics addressed by our proposal are: a) Document recognition and classification The document recognition module identifies the area in the image where the document is present, crop it and straighten it. Finally a NLP classificator has been trained to recognize each document type front and back side (identity docs, contractual docs, income docs, etc.).
b) Field recognition and text extraction Selected cropped boxes pass through an optimizer to reduce noise and apply superresolution in order to improve Optical Character Recognition (OCR) performances. The AI neural network and the OCR process return a set of recognized text and related bounding box pairs. Field key-value assignment is applied to assign box values to specific key field names.
Finally a response is created in which missing and inconsistent fields are reported in order for the agent to double check and resubmit the procedure. If everything is consistent, an OK message is presented and the procedure is archived.