Leveraging AWS AI/ML services to automate the identification of various cost savings opportunities for customers in the energy space

Challenge

Our client provides their partners in the energy industry with a variety of financial solutions to uncover overlooked savings. One of their primary systems that identifies those savings requires data from hundreds of different document types to be consolidated into a single common format before performing analysis. This required an entire team dedicated to reading the financial documents and manually entering the data into this common format.

As they evaluated the future growth of their products and services, they quickly realized that the current manual solution for capturing data from the various document types would not scale. They looked to Protagona to design and build an automated solution to accurately capture the relevant data from hundreds of document types and consolidate them into a centralized data lake.

Solution

Protagona worked closely to quickly identify an appropriate sample size of documents with these very complex formats to begin training models around. Proof-of-concepts were then performed on various AI/ML services within AWS to validate the raw data output and design an automated data pipeline to integrate each service into their corresponding stage of the data lake. The fully built data pipeline now allows the client to upload documents to S3, where a series of Textract, Comprehend and Glue jobs are executed to take the raw data from an image and transform it into the common format their systems need in order to identify cost savings.

Need Intelligent Document Processing?

Protagona offers an AWS Marketplace solution to quickly integrate this pipeline into your business. Intelligently process scanned documents and transform the data to fit your business case. Let our customized IDP pipeline do the heavy lifting so you can focus on what matters most.

Tech Stack

AWS Textract
AWS Comprehend
AWS Sagemaker
AWS Glue
AWS S3
AWS Lambda
AWS DynamoDB
AWS Athena
AWS Quicksight
Python
Terraform

Outcome

Business Agility

By reducing manual document processing and introducing more automation to the process, the client is able to extract data into a data lake and make data-driven business decisions.

Cost Optimization

Leveraging AWS cloud-native services has introduced cost efficiency in otherwise expensive OCR solutions. The serverless architecture will scale based on usage, allowing them to grow their customer base and business without concern around licensing or unforeseen costs.

Data Integrity

The deployment and configuration for all components of the new architecture are fully automated. All changes are run through a multi-stage CI/CD pipeline that provides consistent deployments to each environment, ensuring onboarding of new document formats is done with consistency and lower lead-time.