Kubernetes

How Worker Nodes Execute and Secure Container Workloads

Learn the roles of the Kubelet, Kube-proxy, and Container Runtime in translating control plane instructions into running applications.

Cloud & InfrastructureIntermediate12 min read

In this article

The Orchestration Loop: Bridging Intent and Reality

The Shift from Imperative to Declarative Management

Kubelet: The On-Site Project Manager

Health Probes and Lifecycle Management

The Container Runtime: The Heavy Lifter

The Role of the Sandbox Container

Kube-proxy: Navigating the Network Maze

Handling Traffic and Load Balancing

Operational Reliability: Managing Resource Pressure

The Orchestration Loop: Bridging Intent and Reality

In a distributed system, the biggest challenge is maintaining consistency across hundreds or thousands of physical or virtual machines. When you tell a cluster to run five instances of a payment processing service, you are defining a desired state. The cluster must then figure out which machines have the capacity to host these instances and how to keep them running if a server fails.

Kubernetes solves this through a continuous reconciliation loop that operates at the node level. While the control plane acts as the brain of the cluster, the worker nodes act as the hands. These nodes are responsible for the actual execution of tasks, ensuring that the software containers are started, monitored, and connected to the network as requested.

The worker node architecture is comprised of three primary components that work in tandem: the Kubelet, the Container Runtime, and the Kube-proxy. Each component serves a specific role in the lifecycle of a request, moving from high-level API instructions to low-level Linux kernel operations. Understanding these roles is essential for debugging performance bottlenecks and networking issues in production environments.

Desired State: The configuration stored in the cluster database indicating what should be running.
Actual State: The real-time status of containers and processes currently active on the hardware.
Reconciliation: The process of identifying discrepancies between desired and actual states and taking corrective action.
Node Autonomy: The ability of a worker node to maintain its local state even if the connection to the control plane is temporarily lost.

The Shift from Imperative to Declarative Management

Before modern orchestration, engineers often used imperative scripts to deploy software by sending specific commands to servers. This approach is fragile because if a single command fails, the system is left in an intermediate, broken state. Kubernetes moves away from this by using a declarative model where you describe the outcome rather than the steps.

On the worker node, this means the components are not just executing one-off commands. Instead, they are constantly observing the environment to ensure the local state matches the global requirements. This shift reduces manual intervention and allows the system to be self-healing under various failure conditions.

Kubelet: The On-Site Project Manager

The Kubelet is the primary agent that runs on every node in the cluster. You can think of it as the project manager for the node, responsible for receiving instructions from the Kubernetes API server and translating them into actions. It does not manage containers directly but rather coordinates with other components to ensure the Pod specifications are met.

One of the Kubelet's most important duties is the reporting of node status. It regularly sends heartbeats and resource usage statistics back to the control plane. This data allows the scheduler to make informed decisions about where to place new workloads based on available CPU, memory, and disk space.

The Kubelet is the only component in the worker node that communicates directly with the API server to watch for changes in assigned Pods, making it the critical link in the cluster's command chain.

When a Pod is assigned to a node, the Kubelet receives a PodSpec, which is a YAML or JSON object describing the containers, volumes, and network settings. The Kubelet then interacts with the Container Runtime to pull images and start the containers. It also monitors these containers throughout their lifecycle to ensure they remain healthy and restart them if they crash.

yamlExample PodSpec Handled by Kubelet

1apiVersion: v1
2kind: Pod
3metadata:
4  name: order-processor-service
5spec:
6  containers:
7  - name: processor
8    image: internal-repo/order-processor:v2.1.0
9    resources:
10      requests:
11        memory: "256Mi"
12        cpu: "500m"
13      limits:
14        memory: "512Mi"
15        cpu: "1000m"
16    livenessProbe:
17      httpGet:
18        path: /healthz
19        port: 8080
20      initialDelaySeconds: 5
21      periodSeconds: 10

Health Probes and Lifecycle Management

The Kubelet uses probes to determine the health of a container. A liveness probe tells the Kubelet when to restart a container, while a readiness probe tells it when a container is ready to start accepting traffic. These checks are vital for preventing traffic from being sent to a failing or initializing application.

If a container fails its liveness check, the Kubelet will kill the process and start a new one according to the restart policy. This local decision-making happens without needing a round-trip to the control plane, allowing for rapid recovery from application-level failures. This design ensures that transient errors do not lead to prolonged service outages.

The Container Runtime: The Heavy Lifter

While the Kubelet manages the logic of what should be running, it does not actually run the containers. That responsibility falls to the Container Runtime, a specialized software suite designed to manage the execution and lifecycle of containers. Popular examples include containerd and CRI-O, both of which adhere to standardized interfaces.

The communication between the Kubelet and the runtime happens through the Container Runtime Interface, or CRI. This abstraction layer allows Kubernetes to support multiple runtimes without needing to be recompiled for each one. It provides a consistent API for operations like pulling images, creating containers, and managing container logs.

When the Kubelet requests a container start, the runtime interacts with the Linux kernel to create isolated environments using namespaces and cgroups. Namespaces provide isolation for resources like the network and file system, while cgroups enforce limits on resource consumption such as CPU cycles and memory bytes.

Modern runtimes have moved away from the monolithic architecture of early Docker versions to more modular designs. Containerd, for instance, focuses solely on managing the container lifecycle while staying lightweight and secure. This modularity improves stability because a crash in the runtime manager is less likely to affect the running containers themselves.

The Role of the Sandbox Container

Inside every Pod, the runtime creates a special hidden container often referred to as the pause or sandbox container. This container's sole purpose is to hold the network and IPC namespaces for the entire Pod. All other application containers in the Pod join these namespaces to communicate with each other over localhost.

This architectural choice is why all containers in a Pod share the same IP address and port space. If the application containers crash or restart, the network namespace remains intact because the sandbox container continues to run. This ensures that the Pod's identity and networking state are preserved during individual container failures.

Kube-proxy: Navigating the Network Maze

In a dynamic environment where Pods are constantly created and destroyed, their IP addresses are ephemeral. To provide stable access to these workloads, Kubernetes uses the Service abstraction, which provides a single, persistent IP address for a set of Pods. Kube-proxy is the component responsible for implementing this service networking on each node.

Kube-proxy watches the API server for changes to Service and Endpoint objects. When a Service is created, Kube-proxy programs the local networking rules to ensure that traffic sent to the Service's virtual IP is correctly routed to one of the backend Pods. It effectively acts as a distributed load balancer that lives on every machine in the cluster.

bashInspecting Kube-proxy Rules with IPtables

1# List the NAT table rules to see how Service traffic is redirected
2sudo iptables -t nat -L KUBE-SERVICES -n
3
4# Example output snippet showing a redirect to a specific Pod
5# Target     Prot Opt Source               Destination
6# KUBE-SVC-XYZ  tcp  --  0.0.0.0/0            10.96.0.10           /* default/my-service */
7# KUBE-SEP-ABC  all  --  0.0.0.0/0            0.0.0.0/0            /* default/my-service */ statistic mode random probability 0.50000

By default, Kube-proxy often uses iptables to manage these rules, but for larger clusters, it can use IPVS for better performance and scalability. IPVS is a kernel-based load balancer that offers more sophisticated algorithms like least-connection and faster lookup times. Selecting the right mode is a critical architectural decision for high-traffic environments.

Kube-proxy does not just route external traffic; it also handles internal communication between services within the cluster. Because the rules exist on every node, a container can reach any service in the cluster simply by sending a request to the service's internal IP. The local Kube-proxy instance intercepts the packet and sends it directly to a healthy destination Pod, often on a completely different node.

Handling Traffic and Load Balancing

When traffic hits a Service IP, Kube-proxy uses a round-robin or random selection process to choose a target Pod. It ensures that the load is distributed evenly across all available instances of an application. If a Pod becomes unhealthy, the Kubelet updates the status, and Kube-proxy quickly removes that Pod from the routing rules.

This mechanism prevents the dreaded black hole effect where traffic is sent to a non-functional process. The tight integration between Kube-proxy's routing and the Kubelet's health checks creates a resilient network fabric. Developers can rely on the Service IP as a stable entry point, confident that the underlying infrastructure will handle the plumbing of packet delivery.

Operational Reliability: Managing Resource Pressure

One of the most complex tasks for the worker node components is managing resource exhaustion. If a node runs out of memory, the Linux kernel might start killing processes randomly to protect the system. Kubernetes avoids this chaos by having the Kubelet proactively evict Pods when resources reach critical levels.

The Kubelet monitors the node's disk, memory, and PID pressure. When a threshold is crossed, it ranks Pods based on their Quality of Service class and their resource usage. Pods that are consuming more than their requested limits are usually the first candidates for eviction to ensure the node remains stable for others.

This eviction process highlights the importance of setting accurate resource requests and limits in your deployment manifests. Without these values, the Kubelet cannot effectively prioritize workloads during a crisis. Properly configured Pods allow the node to gracefully shed load rather than suffering a complete system failure.

Ultimately, the harmony between the Kubelet, the Container Runtime, and the Kube-proxy defines the reliability of a Kubernetes cluster. By understanding how these pieces fit together, engineers can better architect applications that leverage the full power of container orchestration. The worker node is not just a host for containers, but a sophisticated management environment that automates the hardest parts of distributed computing.

Inside the Control Plane: The Brain of Cluster Operations Managing Pod Lifecycles: From Scheduling to Resource Limits