Mandatory disclaimer: what follows is relevant as of the writing and original publishing of this article (circa May 2023). If you know AWS like I do, you know that services and features are extremely fluid — just keep an eye on their release RSS feed and you’ll understand. With that said, I will make every effort in keeping this article relevant from time to time.
Introduction
Amazon VPC
Before we dive into the complexities of a multi-environment network, let’s review some of the basics. AWS offers a wide range of networking services to help customers build and manage their infrastructure in the cloud. Some of the key services related to networking include EC2-Classic, VPC (Virtual Private Cloud), Transit Gateway, Transit VPC, and PrivateLink.
EC2-Classic was the original networking model for AWS EC2 instances, where all instances were launched into a single, flat network. However, it is no longer recommended for new deployments and has been replaced by VPC, and AWS announced they were officially sunsetting it back in mid-2021.
VPC is a service that allows customers to create their own virtual private cloud in the AWS cloud. It provides a secure and isolated environment in which customers can launch their own resources, such as EC2 instances, databases, and load balancers. Customers can configure their VPC with their own IP address range, subnets, and routing tables to control network traffic between resources.
Overall, AWS provides a robust set of networking services to help customers build and manage their infrastructure in the cloud, ranging from basic VPCs to more advanced architectures such as Transit Gateway and Transit VPC. PrivateLink also provides a secure way to access AWS services without exposing them to the internet.
Good planning and in many cases, experienced advice can go a long way in preventing some of the most common headaches down the road. Suffice it to say, networking in AWS is a vast and complex area. Let it not be said I didn’t warn you.
A Quick Primer (or Recap) Of Core AWS Networking Capabilities
Let’s do a super quick recap of some of the networking capabilities offered by AWS –this grossly skips over all of the management features offered, so don’t hate me. That belongs in a different article.
EC2-Classic
Use Case — Today? None, unless you are still maintaining one of the few remaining vestiges of infrastructure using it, AWS announced the retirement of EC2-Classic back in mid-2021. If you don’t need to be bound by a VPC, explore the possibilities of going AWS-native; i.e. AWS Lambda, DynamoDB, S3/SQS/SNS, among others service options available to build robust solutions outside the confines of a virtual private network.
Amazon VPC
Use Case — Allows you to create a private, isolated virtual network within AWS. You should use AWS VPC when you need a high level of control over your network environment, including IP addressing, subnets, routing, and network gateways. VPC is particularly useful when you are deploying a complex application that requires multiple resources, such as EC2 instances, RDS databases, and load balancers, and you want to ensure they are all securely connected to each other. Additionally, VPC provides built-in security features such as security groups and network ACLs to help protect your resources from unauthorized access. Ultimately, AWS VPC provides a scalable and flexible way to build a secure and customizable networking environment in the cloud. Read more.
AWS Site-to-Site VPN
Use Case — Used to securely connect your on-premises network to your AWS VPC over an encrypted VPN tunnel. It’s best for hybrid cloud architectures and ensuring compliance with regulatory requirements, providing high availability and automatic failover capabilities. Read more.
VPC Peering
Use Case — Allows you to connect two VPCs together, enabling resources in each VPC to communicate with each other as if they were on the same network. You should use VPC Peering when you need to share resources, such as data or applications, between two VPCs without exposing them to the public internet. VPC Peering can be particularly useful when you have multiple VPCs within the same account or across different accounts that need to communicate with each other. Additionally, VPC Peering can reduce data transfer costs by keeping traffic within the AWS network. Ultimately, AWS VPC Peering provides a simple and cost-effective solution for sharing resources between VPCs. Read more.
AWS PrivateLink
Use Case — Enables you to access services over a private endpoint within your VPC, without exposing the data to the public internet. You should use AWS PrivateLink when you need to securely access third-party services, such as SaaS applications or AWS services owned by other accounts, from within your VPC. PrivateLink can be particularly useful when you need to comply with regulatory requirements for data privacy and security, or when you need to reduce the attack surface of your network. Additionally, PrivateLink can simplify network architecture by eliminating the need for a NAT gateway or a VPN connection. Ultimately, AWS PrivateLink provides a highly secure and scalable solution for accessing third-party services from within your VPC. Read more.
Transit VPC (solution)
Use Case — Building on the Software VPN designs, you can create a global transit network on AWS, connecting multiple, geographically dispersed VPCs and remote networks to create a global network transit center. This design enables the transit VPC to implement complex routing rules, like NAT between overlapping network ranges. Read more.
Transit Gateway
Use Case — You should use AWS Transit Gateway when you need to simplify network connectivity and management across a large number of VPCs and remote networks. Transit Gateway can be particularly useful when you have a complex network topology that requires multiple VPCs and remote networks to communicate with each other, or when you need to support large-scale cloud migrations or hybrid cloud architectures. Additionally, Transit Gateway provides built-in security features, such as network isolation and traffic filtering, to help protect your resources from unauthorized access. Ultimately, AWS Transit Gateway provides a highly scalable and cost-effective solution for connecting multiple networks together in AWS. Read more.
Transit Networks In The Wild
Think of Transit Networks as a network of train stations, with each station representing a VPC or remote network. The trains running between the stations represent the network traffic moving between the VPCs and remote networks. Just as train stations can be connected by rail lines to create a network of stations, VPCs and remote networks can be connected by Transit Networks to create a network of networks.
Transit Networks In Real Life
The Transit Network acts as a hub that connects all the VPCs and remote networks together, providing a central point of control for managing network traffic between them. Just like trains can have multiple stops along a single route, Transit Networks can route traffic between multiple VPCs and remote networks along a single path.
Overall, transit networks play a critical role in enabling the internet and other digital communication technologies to function at a global scale, by providing a reliable and scalable way to connect networks together.
This article will explore the use case of a transit network to control and isolate networks belonging to different environments, while still leveraging the ability to tap into a centralized (or shared services) area, common to all environments.
Enter Amazon Transit Gateway
Transit Gateway
A transit gateway acts as a Regional virtual router for traffic flowing between your virtual private clouds (VPCs) and on-premises networks. A transit gateway scales elastically based on the volume of network traffic. Routing through a transit gateway operates at layer 3, where the packets are sent to a specific next-hop attachment, based on their destination IP addresses.
The anatomy of AWS Transit Gateway consists of a centralized hub that connects multiple VPCs and remote networks together using a single gateway.
- Transit Gateway — The Transit Gateway acts as a router, routing traffic between the connected networks and providing advanced networking features such as traffic filtering, routing policies, and NAT.
- Attachment(s) — The “hub” is connected to the spoke networks via attachments, which can be VPC attachments, VPN attachments, or Direct Connect attachments. VPC attachments allow VPCs to connect to the Transit Gateway, while VPN attachments enable remote networks to connect to the Transit Gateway over a VPN connection. Direct Connect attachments enable on-premises networks to connect to the Transit Gateway over a dedicated network connection.
- Transit Gateway Route Table(s) — Transit Gateway Route Tables are used to control the flow of traffic between the connected networks in AWS Transit Gateway. Each Transit Gateway attachment is associated with a separate route table, which contains rules that determine how traffic is routed between the attachments. The route table includes a default route that directs traffic to the Transit Gateway, as well as custom routes that specify the destination CIDR block and the next hop for specific traffic.
- Associations — This is simply the “glue” between an attachment and a TGW route table. Each attachment can be associated with exactly one TGW route table.
- Route Propagations — Route propagation can be used to automatically propagate routes from one attachment to another, simplifying the management of routing between large numbers of VPCs and remote networks. Additionally, Transit Gateway Route Tables support route priorities and route tables can be associated with different attachments, enabling fine-grained control over network traffic.
If you’d like go into more detail, read more here.
Why Not Just VPC Peering or Site to Site VPNs?
Transit Gateway is often preferred over VPC peering or S2S VPN’s because it provides several advantages for managing network connectivity between multiple VPCs and remote networks. Here are some key reasons why TGW may be preferred over VPC peering:
- Centralized Management: Transit Gateway allows for a centralized hub-and-spoke architecture, making it easier to manage network connectivity between multiple VPCs and remote networks. With VPC peering, each VPC is connected directly to another VPC, making it difficult to manage network traffic at scale. The same is true for VPN tunneling.
- Scalability: Transit Gateway supports up to 5,000 VPC attachments, making it a more scalable solution than VPC peering, which has limits on the number of VPCs that can be peered.
- Routing Control: Transit Gateway provides advanced routing capabilities, including route tables, route propagation, and routing policies, giving administrators fine-grained control over network traffic. With VPC peering, routing is limited to a single route table.
- Cross-Account Connectivity: Transit Gateway can be used to connect VPCs and remote networks across different AWS accounts, making it easier to manage network connectivity in multi-account environments.
Isolating Network Segments
Scenario
Imagine that you have three different AWS accounts for development, production, and shared services environments, and you need to isolate VPCs for each environment from each other. In this scenario, you can use AWS Transit Gateway to create a hub-and-spoke architecture that connects all the VPCs in each environment to a central Transit Gateway in a separate account. By doing this, you can enforce network isolation between the environments while allowing them to communicate with each other over a secure and scalable network.
Transit Gateway Route Tables can be implemented to enforce isolation and granular route propagation to each of the attachments based on desired environment isolation.
Building Things Out
The following diagram shows a simplified case, where we have four (4) VPCs representing 3 (three) different environments: production, non-production, and shared.
Notice that the VPCs laid out below are super simple, spanning a single AZ, in a two-tiered public/private layout with Internet & NAT Gateways (I’ll touch on some TGW “gotchas” about AZ’s later).
Our VPC Layout
The connectivity rules we are establishing are super simple:
- Production resources need to be able to address resources on other production VPCs.
- Production resources need to address resources on shared VPC’s –and vice versa.
- Non-production resources must not be able to address resources in production VPC’s.
- Non-production resources must be able to address resources in shared VPC’s –and vice versa.
- Outbound internet / public traffic is routed through each local IGW.
If we were to represent this with a matrix, it would look like this:
From here, we can lay out our Transit Gateway and 3 (three) route tables –one per environment segment. Along with VPC Attachments for each of our 4 (four) VPC’s.
Transit Gateway Route Tables
Each of our attachments will be associated with exactly one TGW route table as we indicated earlier: attachment’s 1 and 4 are production, attachment 2 is non-production, and attachment 3 points to our shared VPC.
Next, we’ll add propagations to each of our TGW route tables following the rules we set out above.
Our production TGW route table will propagate routes for attachments 1 (prod), 3 (shared), and 4 (prod). Our non-production TGW RT will contain propagations for attachments 2 (non-prod) and 3 (shared). And finally, our shared TGW RT will contain propagations to all of our attachments. Easy.
But wait, there’s one last thing that needs to happen here –and you might spot it if you look closely at the diagram below. Zoom in on the VPC route tables. Ask yourself, how will resources hosted on either private or public subnets know about networks distributed through our transit network?
Your first instinct might be to assume that, since we’ve already established TGW route tables, and have set up propagation for individual attachments, that those same routes would be “advertised” through the attachments themselves. Well, not quite.
In good AWS fashion, you are given full control over what each subnet’s route table contains in terms of routes. Transit Gateway makes it possible to route between these isolated VPCs, but we still need to manually set those routes and “next” hops manually. There is no dynamic advertisement via border gateway protocol (BGP) for VPC attachments.
So, the above implementation would give private subnets only the ability to reach other transit network spokes of the same environment. Resources hosted on the public subnets would have no idea how to cross-through. But as I mentioned, this is now entirely up to you and your specific needs.
Final Musings And Design Considerations
In no particular order, here are some of the design considerations I would like to highlight before you set out to roll out TGW:
Network Topology
Transit Gateway uses a hub-and-spoke topology, where the Transit Gateway acts as the central hub, and the VPCs and other networks are attached as spokes. It’s important to consider your network topology and plan how you want to connect your VPCs and other networks to the Transit Gateway. You can create a network of networks, but that introduces other complexities that go well beyond this article.
VPC CIDR Overlap
Transit Gateway supports overlapping CIDRs between VPCs, which can make it easier to manage IP address space. However, it’s preferable to avoid overlap of IP CIDRs between VPCs and other attached networks (Direct Connect / VPN). If you find yourself in dire straits, this is a good read.
Routing
Transit Gateway provides advanced routing capabilities, including route tables, route propagation, and routing policies. It’s important to plan your routing strategy and configure your Transit Gateway route tables accordingly.
Take advantage of route propagation for AWS Direct Connect gateway attachments and BGP Site-to-Site VPN attachments. And remember, VPC route propagation is not supported.
Only resources that reside in Availability Zones where there is a transit gateway attachment can reach the transit gateway. Enable multiple Availability Zones to ensure availability.
Security:
Transit Gateway respects VPC and resource-level security groups and network ACLs, which can help you control access to your network resources.
Unlike VPC Peering, Transit Gateway does not support Security Group referencing. Keep that in mind if you are migrating from VPC Peering. There is a very good article dealing with this scenario here.
Create one network ACL and associate it with all of the subnets that are associated with the transit gateway. Keep the network ACL open in both the inbound and outbound directions.
Enable Transit Gateway FlowLogs.
Integration with Other AWS Services
Transit Gateway integrates with a variety of other AWS services, including Direct Connect, VPN, AWS PrivateLink, and AWS Firewall Manager. It’s important to consider how you want to integrate these services with your Transit Gateway and plan your configuration accordingly.
Scalability
Transit Gateway supports up to 5,000 VPC attachments, which makes it a scalable solution for managing network connectivity between multiple VPCs and remote networks. It’s important to plan for scalability and consider how you will manage your Transit Gateway as your network grows.
Cost
Transit Gateway charges a flat fee per hour, which is comparable to S2S VPN, but with lower data transfer pricing, which can make it a cost-effective solution for managing network connectivity between multiple VPCs and remote networks.