Virtual Private Clouds (VPC)

Designing Multi-Tier Architectures with Public and Private Subnets

Learn how to partition your network into tiers to isolate databases and application logic from public-facing web servers while maintaining high availability.

Cloud & InfrastructureIntermediate12 min read

In this article

Constructing the Foundations of Cloud Networking

The Principle of Least Privilege in Network Design
Defining Scalable CIDR Blocks

Designing the Three-Tier Subnet Architecture

The Public Tier and Internet Gateways
Isolating the Data Tier

Managing Traffic Flow and Security Controls

Stateful vs Stateless Filtering
Leveraging NAT Gateways for Egress

Resilience through Geographic Redundancy

Redundancy for Critical Gateways
Monitoring Network Health and Performance

Constructing the Foundations of Cloud Networking

Modern cloud infrastructure relies on the ability to define a network environment that mirrors a traditional data center but with the flexibility of software-defined components. A Virtual Private Cloud provides the fundamental isolation layer that ensures your compute resources are not exposed to the global internet by default. This isolation is the first step in establishing a robust security posture for any production application.

Engineering teams often struggle with the transition from local development to cloud networking because the abstractions hide complex routing and encapsulation mechanisms. Understanding the relationship between your IP address space and the underlying physical hardware is essential for debugging connectivity issues. A well-designed network prevents unauthorized access while allowing legitimate traffic to reach its destination efficiently.

The primary goal of a network architect is to create a predictable environment where every packet has a clear path and every resource is protected by multiple layers of defense. By logically partitioning your resources, you can minimize the blast radius of potential security incidents. This approach ensures that a compromise in one area of the system does not lead to a total failure across the entire infrastructure.

Choosing the right size for your network is a decision that has long-term implications for scalability and integration. If you select a CIDR block that is too small, you may run out of IP addresses as your services grow or as you add more containers and serverless functions. Conversely, overlapping IP ranges can make it nearly impossible to connect different environments or integrate with third-party providers via peering.

The Principle of Least Privilege in Network Design

Network isolation should be governed by the principle of least privilege, meaning that resources should only have the minimum level of access required to perform their function. In a typical cloud environment, this translates to placing resources in specific subnets based on their exposure to external traffic. Public subnets are reserved for load balancers and bastion hosts, while private subnets house sensitive application logic.

By defaulting to private subnets for all core services, you force an explicit decision for every external connection. This proactive approach significantly reduces the attack surface of your application by ensuring that databases and internal APIs are never directly reachable from the public internet. It also simplifies the auditing process, as network administrators can easily identify which paths are intended for external communication.

Defining Scalable CIDR Blocks

The Classless Inter-Domain Routing notation defines the range of IP addresses available within your network. For a standard production environment, a size like slash sixteen provides over sixty-five thousand addresses, which is generally sufficient for large-scale operations. It is important to remember that some cloud providers reserve specific addresses in every subnet for internal services like DNS and gateway management.

When planning your subnets, you should allocate larger blocks to your application and database tiers, as these will likely scale more aggressively than your public edge tier. Organizing your address space into contiguous blocks makes it easier to write routing rules and firewall policies later. Consistent naming conventions and documentation of these ranges will save hours of troubleshooting when your infrastructure spans multiple regions or accounts.

Designing the Three-Tier Subnet Architecture

The three-tier architecture is a time-tested pattern that separates presentation, logic, and data storage into distinct network zones. This separation allows you to apply different security controls and scaling policies to each layer independently. For example, your web servers might scale based on incoming requests, while your database tier focuses on high availability and disk performance.

Implementing this pattern in a virtual network requires careful configuration of route tables and gateways. Each tier resides in its own set of subnets, and traffic flow between them is strictly regulated by firewall rules. This structure not only improves security but also enhances the organization of your cloud resources, making it easier for new engineers to understand the system flow.

hclDefining a Tiered Subnet Structure with Terraform

1# Define the primary network container
2resource "aws_vpc" "production_network" {
3  cidr_block = "10.0.0.0/16"
4  enable_dns_hostnames = true
5  tags = {
6    Name = "production-vpc"
7  }
8}
9
10# Create a public subnet for load balancers
11resource "aws_subnet" "public_tier_az1" {
12  vpc_id            = aws_vpc.production_network.id
13  cidr_block        = "10.0.1.0/24"
14  availability_zone = "us-east-1a"
15  map_public_ip_on_launch = true
16
17  tags = {
18    Name = "public-subnet-1a"
19  }
20}
21
22# Create a private subnet for internal application servers
23resource "aws_subnet" "application_tier_az1" {
24  vpc_id            = aws_vpc.production_network.id
25  cidr_block        = "10.0.10.0/24"
26  availability_zone = "us-east-1a"
27
28  tags = {
29    Name = "app-private-subnet-1a"
30  }
31}

The public tier serves as the entry point for all client requests and typically hosts resources like Application Load Balancers or NAT Gateways. Because these resources are exposed to the internet, they are the most vulnerable and require the most stringent monitoring. By keeping this tier small and specialized, you can more easily manage the logs and traffic patterns passing through your edge.

In contrast, the application and database tiers remain entirely private, with no direct route to or from the internet gateway. Communication with these tiers happens through the load balancer or internal service discovery mechanisms. This creates a bottleneck that you can control, ensuring that only validated and filtered traffic ever reaches your business logic or sensitive data stores.

The Public Tier and Internet Gateways

An internet gateway is a horizontally scaled, redundant, and highly available component that allows communication between your network and the internet. For a subnet to be considered public, it must have a route in its route table that points all non-local traffic to this gateway. Without this specific configuration, even resources with public IP addresses remain unreachable from the outside world.

Public subnets are often used for bastion hosts, also known as jump boxes, which provide a secure way for administrators to access private instances via SSH or RDP. By restricting access to these bastion hosts to specific source IP ranges, you can maintain a high level of security while still allowing for manual maintenance. This pattern is increasingly being replaced by managed session services that do not require public IP addresses at all.

Isolating the Data Tier

The data tier is the most sensitive layer of your architecture and should be isolated in subnets that have no path to the internet, even via a NAT gateway. These subnets are often referred to as fully isolated or restricted subnets because they only communicate with the application tier. This level of isolation is crucial for protecting against data exfiltration in the event of an application-level vulnerability.

To perform updates or backups in a fully isolated tier, you can use specialized endpoints that provide private access to cloud services without traversing the public internet. These endpoints keep your traffic within the provider network, improving both security and performance. When designing this tier, ensure that your IP allocation is generous enough to handle database clusters and read replicas across multiple availability zones.

Managing Traffic Flow and Security Controls

Managing traffic flow within a virtual network involves two primary mechanisms: routing and filtering. Routing determines the destination for every packet based on its IP address, while filtering decides whether that packet is allowed to pass through. Together, these controls form a comprehensive traffic management strategy that protects your services from unauthorized access and internal misconfigurations.

A common challenge for engineers is providing private resources with the ability to download software updates or connect to external APIs without exposing them to the internet. This is typically solved using a Network Address Translation gateway, which sits in a public subnet and relays traffic on behalf of private resources. The gateway allows outbound requests while blocking any unsolicited inbound connections from reaching the internal instances.

Security Groups act as stateful firewalls at the instance level, tracking connection states and automatically allowing return traffic.
Network Access Control Lists provide a stateless layer of security at the subnet level, acting as a secondary defense for all resources within that boundary.
Route Tables define the hop-by-hop path for traffic, ensuring packets are directed to the correct gateways or peering connections.
Flow Logs capture detailed information about the IP traffic going to and from network interfaces, which is essential for security audits.

Security groups are the primary tool for developers to manage access between different tiers of the application. Because they are stateful, you only need to define rules for the initial direction of the traffic, and the response is handled automatically. This simplifies the management of complex communication patterns between microservices that require frequent bidirectional exchanges.

Stateful vs Stateless Filtering

Understanding the difference between stateful and stateless filtering is critical for avoiding common connectivity pitfalls. Security groups are stateful, meaning if you allow an inbound request on port eighty, the outbound response is automatically permitted regardless of outbound rules. This behavior is ideal for web servers and application logic where the return path is predictable.

Network Access Control Lists, on the other hand, are stateless and require explicit rules for both inbound and outbound traffic. This means that if you allow traffic on port eighty, you must also allow high-numbered ephemeral ports for the return traffic to pass through. Because of this complexity, NACLs are best used as a broad safety net to block specific IP ranges or provide a final layer of defense.

Leveraging NAT Gateways for Egress

NAT gateways provide a managed solution for egress-only internet access, handling the translation of private IP addresses to a single public address. This is a crucial component for any production environment where instances in private subnets need to reach external repositories or third-party services. Because NAT gateways are managed by the provider, they scale automatically to meet demand and offer high reliability.

However, NAT gateways can become a significant cost driver and a potential bottleneck if not monitored closely. Large data transfers through a NAT gateway incur processing charges that can exceed the cost of the compute resources themselves. For high-volume data transfers to internal cloud services, using VPC Endpoints is a more cost-effective and performant alternative that bypasses the NAT gateway entirely.

Resilience through Geographic Redundancy

A single network failure should never result in the total outage of your application. High availability is achieved by distributing your resources across multiple Availability Zones, which are isolated locations within a geographic region. By mirroring your three-tier architecture in at least two or three zones, you ensure that your system can withstand the failure of an entire data center.

Implementing a multi-zone strategy requires creating subnets in each zone and ensuring your load balancer is configured to distribute traffic across all of them. This redundancy applies not just to your compute instances but also to your networking components like NAT gateways. Each zone should have its own gateway to prevent a single point of failure from cutting off egress access for the entire network.

True resilience is not just about having extra capacity; it is about ensuring that the network path to that capacity is as redundant as the compute resources themselves.

Cross-zone traffic often carries additional latency and costs, so it is important to design your application to prefer local communication whenever possible. Many modern service meshes and load balancers support zone-aware routing, which keeps traffic within the same availability zone to reduce overhead. This optimization is particularly important for latency-sensitive applications like real-time bidding or high-frequency trading.

pythonValidating Multi-AZ Subnet Distribution

1import boto3
2
3def check_subnet_distribution(vpc_id):
4    # Initialize the EC2 client
5    client = boto3.client('ec2')
6    
7    # Retrieve all subnets for the given VPC
8    response = client.describe_subnets(Filters=[{'Name': 'vpc-id', 'Values': [vpc_id]}])
9    subnets = response['Subnets']
10    
11    # Map subnets to their respective Availability Zones
12    az_map = {}
13    for subnet in subnets:
14        az = subnet['AvailabilityZone']
15        az_map.setdefault(az, []).append(subnet['SubnetId'])
16    
17    # Print the distribution summary
18    for az, subnet_ids in az_map.items():
19        print(f"Zone {az} has {len(subnet_ids)} subnets: {', '.join(subnet_ids)}")
20
21# Example usage for a production environment
22# check_subnet_distribution('vpc-0a1b2c3d4e5f6g7h8')

Redundancy for Critical Gateways

In a multi-zone architecture, it is a best practice to deploy a NAT gateway in every public subnet of each availability zone. This ensures that if one zone experiences a service disruption, the instances in other zones can still reach the internet through their local gateways. While this increases the monthly fixed cost, it eliminates a critical single point of failure that could otherwise cripple your private tiers.

Routing tables must be updated to point to the specific NAT gateway located in the same availability zone as the subnet. This configuration keeps traffic local to the zone, reducing latency and avoiding cross-zone data transfer charges for outbound traffic. Automating this setup with infrastructure as code tools ensures that your network remains consistent and resilient as you scale.

Monitoring Network Health and Performance

Visibility is the key to maintaining a healthy network environment over time. You should implement comprehensive logging and monitoring to track traffic patterns, identify unauthorized access attempts, and detect performance bottlenecks. Metrics like packet loss, latency between tiers, and gateway throughput provide the data needed to make informed scaling decisions.

Analyzing flow logs can reveal unexpected communication patterns, such as a database instance trying to reach an external IP address, which could indicate a security breach. By integrating these logs with security information and event management systems, you can create automated alerts for suspicious activity. Regular audits of your security group rules and route tables are also necessary to prevent configuration drift and maintain a lean security profile.

Configuring NAT Gateways for Secure Outbound-Only Internet Access