
Application Governance Framework Re-Architecture and Modernization
Protagona helped a major enterprise develop an event-driven distributed deployment engine to provide a fast, user-friendly, and observable platform for teams to manage the lifecycle of AWS accounts and components within.
Industry
Travel & Hospitality
Teams & Services
DevOps, FinOps, Security Engineering
Tech & Tools
AWS Lambda, AWS Step Functions, GltLab Pipelines, Python, Docker
Key Data Points
The Vision
The customer wanted to reduce time and energy spent on undiffernetiated lifting; tasks that are operational in nature and not directly related to business goals. In the context of AWS, this meant leveraging automation and building custom solutions that would dramtically reduce the human capital required to maintain resourcse.
The Goal
Reduced operational overhead, error rates, and deployment time would result in significant reductions in man-hours and money spent on maintaing critical infrastrcuture componenets, freeing up teams to focus on initatives driving business value.
The Challenge
A major enterprise with an increasing number of AWS accounts faced challenges in effective management, including account provisioning/decommissioning, compliance enforcement, and automatic provisioning of centralized services like DNS and Transit Gateway connections.
The existing workflows often suffered from poor observability, with deployment logs stored in ephemeral CI/CD systems and no holistic, centralized dashboards or execution tracing. Additionally, the tightly coupled pipelines made deployments cumbersome, often requiring teams to spend over 10 hours on a single deployment across multiple AWS accounts.
The Solution
Protagona helped design and implement an event-driven architecture for managing the lifecycle of their AWS accounts, including, provisioning, bootstrapping, and deployment of Cloudformation stacks that provide various functionalities. These stacks are known as ""Account Modules"", and are developed by different teams within the enterprise responsible for various domains such as security, compliance, networking, and more.
This serverless architecture removed the dependency on legacy tooling that was restrictive, linearly deployed (took several hours), and offered limited visibility. IT enabled teams to work on their own components without being tied to other teams release cadences.
Increased observability was augmented by enabling and integrating AWS X-Ray for all Lambdas, and API Gateway, consolidating all execution logs under Amazon Cloudwatch logs with pre-defined Amazon Cloudwatch Insights filters, and metrics being aggregated into Amazon Cloudwatch dashboards. The operational teams were given documentation and guidance on how to troubleshoot various deployment scenarios, including where to look for data and how to tie it back to a specific execution. After the new system went live, real-time insights into current and historical executions were made available to all teams involved in using the product. Additionally, CloudWatch Alarms and custom EventBridge events are sent to slack to notify team members and users of actionable events in real time.
The architecture was successfully implemented in coordination with various teams, and serves as the main entry point for multiple teams to develop, deploy, and monitor components they are responsible for. For example, distinct teams manage the automation for centralized DNS and automated AWS Transit Gateway associations. These teams are able to independently manage the lifecycle of their products by leveraging the deployment engine. Because this involved multiple teams migrating to a new platform after they were used to the old one for some time, we conducted multiple training sessions, setup brown bags, KTs, and other events to drive adoption and familiarity..
