ocr machine learning generative ai

Reducing costs by the healthy application of machine learning and AI

Industry

Financial Services

Teams & Services

Data Engineering / Data Sciences

Tech & Tools

Amazon Textract / Amazon Comprehend / Amazon SageMaker / AWS Glue / Amazon Athena / Amazon QuickSight

Key Data Points

Reducing manual document processing
Go Cloud-Native
Automation from the ground up

The Vision

To adopt an Intelligent Document Processing pipeline

The Goal

Leveraging AWS AI/ML services to automate the identification of various cost savings opportunities for customers in the energy space.

The Challenge

Our client provides their partners in the energy industry with a variety of financial solutions to uncover overlooked savings. One of their primary systems that identifies those savings requires data from hundreds of different document types to be consolidated into a single common format before performing analysis. This required an entire team dedicated to reading the financial documents and manually entering the data into this common format.

As they evaluated the future growth of their products and services, they quickly realized that the current manual solution for capturing data from the various document types would not scale. They looked to Protagona to design and build an automated solution to accurately capture the relevant data from hundreds of document types and consolidate them into a centralized data lake.

The Solution

Protagona worked closely to quickly identify an appropriate sample size of documents with these very complex formats to begin training models around. Proof-of-concepts were then performed on various AI/ML services within AWS to validate the raw data output and design an automated data pipeline to integrate each service into their corresponding stage of the data lake. The fully built data pipeline now allows the client to upload documents to S3, where a series of Textract, Comprehend and Glue jobs are executed to take the raw data from an image and transform it into the common format their systems need in order to identify cost savings.

OUTCOMES

Your data is trying to tell you something

Contact us

... are you listening?