Virtual Private Clouds (VPC)

Implementing Layered Security with Security Groups and Network ACLs

Master the differences between stateful security groups and stateless network access control lists to create a robust defense-in-depth perimeter.

Cloud & InfrastructureIntermediate12 min read

In this article

The Architecture of Isolation: Beyond the Perimeter

The Instance Level vs the Subnet Level
Defining the Micro-Perimeter

Security Groups: The Intelligent Stateful Gatekeeper

The Logic of Stateful Tracking
Referencing Groups Over IP Addresses

Network ACLs: The Rigid Stateless Border Guard

Solving the Ephemeral Port Challenge

Designing a Robust Defense-in-Depth Strategy

The Web-App-Database Pattern
Best Practices for Rule Maintenance

The Architecture of Isolation: Beyond the Perimeter

In a modern cloud environment, the traditional concept of a single hard outer shell is no longer sufficient for protecting production workloads. A Virtual Private Cloud provides the foundation for networking, but simply placing resources inside a private network does not automatically make them secure from internal or external lateral movement.

Security in the cloud relies on the principle of defense in depth, where multiple independent layers of protection validate every packet that traverses the network. This approach ensures that if one layer is misconfigured or bypassed, subsequent layers are present to mitigate the potential impact of an intrusion.

We achieve this multi-layered protection by combining two distinct types of virtual firewalls that operate at different levels of the networking stack. Security Groups act as a distributed firewall at the instance level, while Network Access Control Lists serve as a centralized gatekeeper for entire subnets.

True security in a VPC environment is achieved when your network configuration assumes the perimeter will eventually be breached and focuses on limiting the blast radius through granular internal controls.

The Instance Level vs the Subnet Level

Every network interface attached to an elastic compute instance or a managed service resides within a specific subnet. Understanding where your security controls sit in relation to these resources is the first step toward building a resilient architecture.

Network Access Control Lists provide the first line of defense for the entire subnet, processing traffic as it enters or exits the subnet boundary. Security Groups operate much closer to the application, acting as a final filter that sits directly on the elastic network interface of the specific resource.

Defining the Micro-Perimeter

By applying security rules at the individual instance level, we create a micro-perimeter around every single component of our application architecture. This granular control allows us to define specific communication pathways between tiers, such as allowing only the web tier to talk to the application tier on a specific port.

This strategy minimizes the risk of an attacker moving laterally through your network if they manage to compromise a single public-facing server. Even if two servers live in the same subnet, a well-defined security group can prevent them from communicating with each other entirely.

Security Groups: The Intelligent Stateful Gatekeeper

Security groups are the most frequently used tool for managing traffic because they are stateful and operate at the resource level. Stateful means that if you send an outbound request from your server, the security group automatically remembers that connection and allows the response back in regardless of inbound rules.

This behavior significantly simplifies rule management for developers because you only need to focus on the direction that initiates the communication. You do not have to worry about the complex mechanics of tracking temporary ports or return traffic for established sessions.

Security groups also allow for a powerful feature known as security group referencing, where you can allow traffic from any resource that shares a specific group ID. This eliminates the need to maintain static lists of IP addresses as your application scales up or down dynamically.

hclTerraform Web Server Security Group

1resource "aws_security_group" "web_server_sg" {
2  name        = "web-server-production"
3  description = "Allow HTTPS traffic from the internet"
4  vpc_id      = var.vpc_id
5
6  # Inbound rule for HTTPS traffic
7  ingress {
8    from_port   = 443
9    to_port     = 443
10    protocol    = "tcp"
11    cidr_blocks = ["0.0.0.0/0"]
12  }
13
14  # Outbound rule allowing access to the database group
15  egress {
16    from_port       = 5432
17    to_port         = 5432
18    protocol        = "tcp"
19    security_groups = [aws_security_group.database_sg.id]
20  }
21}

The Logic of Stateful Tracking

Stateful firewalls maintain a connection table that tracks the state of active sessions between sources and destinations. When an instance initiates a request to an external API, the security group creates an entry in this table to recognize the expected return traffic.

Because the return traffic is automatically permitted, you avoid the common pitfall of accidentally blocking your own application responses. This makes security groups ideal for high-level application logic where simplicity and developer productivity are prioritized.

Referencing Groups Over IP Addresses

Hard-coding IP addresses into firewall rules is a brittle practice that leads to frequent production outages during scaling events. By referencing a security group ID in your rules, the network automatically updates to allow traffic from any new instance launched within that group.

This abstraction is essential for cloud-native architectures where instances are ephemeral and IP addresses are constantly recycled. It allows you to define intent based on the role of the resource rather than its current physical location in the network.

Network ACLs: The Rigid Stateless Border Guard

Network Access Control Lists offer an additional layer of security that functions differently than security groups by being entirely stateless. Stateless means the network does not remember previous requests, so you must explicitly define rules for both the outbound request and the inbound response.

While this sounds more complex, it provides a crucial capability that security groups lack: the ability to explicitly deny traffic. Security groups only support allow rules, meaning they are effectively a whitelist, whereas NACLs can be used to block specific malicious IP ranges or geographic regions.

NACLs are processed in a strict numerical order starting from the lowest numbered rule, and the first rule that matches the traffic is applied immediately. This deterministic processing makes them excellent for implementing broad organization-wide policies that should never be overridden by individual application teams.

NACLs apply at the subnet level to all resources within that subnet.
They are stateless, requiring explicit rules for ephemeral return ports.
Rules are processed in numerical order from lowest to highest.
They support both Allow and Deny rules for granular traffic rejection.

Solving the Ephemeral Port Challenge

A common frustration for developers using NACLs is the requirement to open ephemeral ports to allow return traffic for outbound requests. When your server initiates a connection to a database or API, the response returns on a high-numbered port range typically between 1024 and 65535.

If you do not open this wide range of ports in your inbound NACL rules, your outbound requests will appear to timeout even if the destination received the packet. Mastering the management of these ephemeral ports is the key to successfully implementing stateless security at the subnet boundary.

Designing a Robust Defense-in-Depth Strategy

The most secure architectures use a combination of NACLs and Security Groups to create a nested security model. In this design, the NACL acts as a broad brush to filter out known bad actors or restrict traffic to specific corporate CIDR blocks at the entry point of the subnet.

Once traffic passes the NACL, the Security Group then applies a surgical approach by only allowing the specific protocols and ports required for the application to function. This ensures that even if a developer accidentally opens a security group to the whole world, the NACL still provides a safety net.

This tiered approach also simplifies auditing and compliance by separating responsibilities between network administrators and application developers. Network admins manage the broad NACL policies, while developers manage the specific application-level security groups.

hclImplementing a Subnet NACL for a Web Tier

1resource "aws_network_acl" "public_subnet_nacl" {
2  vpc_id     = var.vpc_id
3  subnet_ids = [aws_subnet.public_a.id, aws_subnet.public_b.id]
4
5  # Allow HTTP inbound from anywhere
6  ingress {
7    protocol   = "tcp"
8    rule_no    = 100
9    action     = "allow"
10    cidr_block = "0.0.0.0/0"
11    from_port  = 80
12    to_port    = 80
13  }
14
15  # Allow return traffic from ephemeral ports
16  ingress {
17    protocol   = "tcp"
18    rule_no    = 200
19    action     = "allow"
20    cidr_block = "0.0.0.0/0"
21    from_port  = 1024
22    to_port    = 65535
23  }
24
25  # Outbound rule for all traffic (broad policy)
26  egress {
27    protocol   = "-1"
28    rule_no    = 100
29    action     = "allow"
30    cidr_block = "0.0.0.0/0"
31    from_port  = 0
32    to_port    = 0
33  }
34}

The Web-App-Database Pattern

In a standard three-tier architecture, you should place your database in a private subnet with a NACL that only allows inbound traffic from the application subnet. This creates a hard physical boundary that prevents any direct access from the public internet regardless of how the instances are configured.

The application tier security group should then be the only entity permitted to access the database port, ensuring that even internal resources like build servers cannot reach your sensitive data. This layering of subnet-level and instance-level controls is the gold standard for cloud infrastructure security.

Best Practices for Rule Maintenance

Always leave gaps between your NACL rule numbers, such as increments of 10 or 100, to allow for the insertion of new rules in the future without renumbering. This provides the flexibility to add specific deny rules at the top of the list if you identify an active attack or a compromised internal resource.

Use infrastructure as code to manage these rules so that every change is tracked in version control and can be reviewed by security peers before deployment. This prevents human error and ensures that your network security posture is reproducible across different environments like staging and production.

Configuring NAT Gateways for Secure Outbound-Only Internet Access Connecting Distributed Networks via VPC Peering and Transit Gateways