Leveraging AWS AI/ML services to automate the identification of various cost savings opportunities for customers in the energy space


Our client provides their partners in the energy industry with a variety of financial solutions to uncover overlooked savings. One of their primary systems that identifies those savings requires data from hundreds of different document types to be consolidated into a single common format before performing analysis. This required an entire team dedicated to reading the financial documents and manually entering the data into this common format.

As they evaluated the future growth of their products and services, they quickly realized that the current manual solution for capturing data from the various document types would not scale. They looked to Protagona to design and build an automated solution to accurately capture the relevant data from hundreds of document types and consolidate them into a centralized data lake.



Protagona worked closely to quickly identify an appropriate sample size of documents with these very complex formats to begin training models around. Proof-of-concepts were then performed on various AI/ML services within AWS to validate the raw data output and design an automated data pipeline to integrate each service into their corresponding stage of the data lake. The fully built data pipeline now allows the client to upload documents to S3, where a series of Textract, Comprehend and Glue jobs are executed to take the raw data from an image and transform it into the common format their systems need in order to identify cost savings. 

Need Intelligent Document Processing?

Protagona offers an AWS Marketplace solution to quickly integrate this pipeline into your business. Intelligently process scanned documents and transform the data to fit your business case. Let our customized IDP pipeline do the heavy lifting so you can focus on what matters most.

Tech Stack
  • AWS Textract
  • AWS Comprehend
  • AWS Sagemaker
  • AWS Glue
  • AWS S3
  • AWS Lambda
  • AWS DynamoDB
  • AWS Athena
  • AWS Quicksight
  • Python
  • Terraform


Business Agility 

By reducing manual document processing and introducing more automation to the process, the client is able to extract data into a data lake and make data-driven business decisions.

Cost Optimization

Leveraging AWS cloud-native services has introduced cost efficiency in otherwise expensive OCR solutions. The serverless architecture will scale based on usage, allowing them to grow their customer base and business without concern around licensing or unforeseen costs.

Data Integrity 

The deployment and configuration for all components of the new architecture are fully automated. All changes are run through a multi-stage CI/CD pipeline that provides consistent deployments to each environment, ensuring onboarding of new document formats is done with consistency and lower lead-time. 


Let's have some fun.

Send us a message detailing your needs and we'll respond within 24 hours. Really.