WebSockets

Scaling WebSockets with Redis Pub/Sub and Load Balancers

Architect a distributed system that synchronizes WebSocket states and broadcasts messages across multiple server instances using a shared messaging layer.

Backend & APIsIntermediate12 min read

In this article

The Evolution from Stateless to Stateful Architecture

Why Standard Load Balancing Isn't Enough

Architecting the Shared Messaging Layer

Decoupling Message Delivery

Building the Broadcast Infrastructure

Optimizing Subscription Lifecycles

Operational Challenges and Reliability

Handling Backpressure and Latency

Monitoring and Security in a Cluster

Graceful Shutdown and Maintenance

The Evolution from Stateless to Stateful Architecture

WebSockets represent a fundamental shift in how we think about web communication. While standard HTTP is inherently stateless and follows a request-response pattern, WebSockets create a persistent, long-lived connection between the client and the server. This allows for full-duplex communication where both parties can send data at any time without waiting for a request.

This persistent nature is exactly what makes WebSockets powerful for real-time applications like chat platforms, financial tickers, and collaborative editing tools. However, this same statefulness introduces a significant architectural hurdle when it comes time to scale. Unlike a REST API where any server can handle any incoming request, a WebSocket client is physically tied to the memory and resources of one specific server instance.

In a single-server environment, managing these connections is straightforward because the server has a global view of all active participants. If User A wants to send a message to User B, the server simply looks up User B in its local connection registry and pushes the data. This mental model breaks immediately once you introduce a second server into your infrastructure.

Scaling stateful connections is the process of turning isolated server silos into a unified communication fabric that behaves like a single machine.

When you deploy multiple instances behind a load balancer, your users become fragmented across different environments. A user on Server 1 has no way of communicating with a user on Server 2 through traditional memory-based lookups. This isolation is often referred to as the connection silo problem, and solving it requires an external source of truth to bridge the gap between nodes.

Why Standard Load Balancing Isn't Enough

Load balancers are traditionally designed to distribute independent requests across a pool of interchangeable workers. With WebSockets, the load balancer's role changes because it must maintain the connection affinity once the initial handshake is completed. If the load balancer prematurely terminates the connection or fails to support sticky sessions, the real-time experience will degrade into constant reconnection cycles.

Even with perfect session stickiness, the load balancer cannot help with cross-server communication. It can ensure User A stays on Server 1, but it provides no mechanism for Server 1 to notify Server 2 that a new message has arrived for User B. To fix this, we must introduce a shared messaging layer that acts as the nervous system for our distributed application.

Architecting the Shared Messaging Layer

To synchronize state across multiple instances, we utilize the Publisher-Subscriber pattern, commonly known as Pub/Sub. In this model, every server instance acts as both a publisher and a subscriber to a central message broker. When a message is received from a client, the server publishes that event to a specific channel on the broker rather than just processing it locally.

Other server instances subscribe to these channels and listen for incoming events from the broker. When an instance receives a message from the broker, it checks its local connection map to see if the intended recipient is currently connected to its specific process. If a match is found, the server forwards the message through the local WebSocket connection to the client.

Redis: Exceptional for low-latency messaging and easy to set up with built-in Pub/Sub capabilities.
RabbitMQ: Offers robust routing logic and message durability but introduces higher configuration complexity.
Apache Kafka: Ideal for massive scale and event sourcing where message persistence and replayability are required.
NATS: A high-performance, cloud-native messaging system designed for simplicity and speed in microservices.

For most real-time applications, Redis is the industry standard because of its extreme speed and minimal overhead. Since WebSocket messages are often ephemeral and time-sensitive, the sub-millisecond latency provided by Redis ensures that the synchronization layer does not become a bottleneck for the user experience.

By moving the responsibility of message distribution to a dedicated broker, you decouple the connection handling from the application logic. This allows your WebSocket servers to remain lean and focused solely on maintaining active connections and serializing data. The broker becomes the central hub that guarantees all instances are aware of global events in real-time.

Decoupling Message Delivery

Decoupling means that your application servers no longer need to know about the existence of other servers. Each node simply talks to the message broker, which handles the complex task of broadcasting data to all interested parties. This simplifies your deployment process because you can add or remove server instances dynamically without reconfiguring your network topology.

This architectural pattern also enhances fault tolerance within your system. If one server instance crashes, the other instances continue to function normally, and the message broker ensures that messages are still delivered to the remaining healthy nodes. While the clients on the crashed server will lose their connection, they can immediately reconnect to a different node and resume their session.

Building the Broadcast Infrastructure

Implementing a distributed broadcast system requires integrating a message broker client into your existing WebSocket server logic. In a typical Node.js environment, you might use the ws library alongside ioredis to manage the connections and the pub/sub channels. The server must initialize two separate connections to the broker: one for publishing events and one for subscribing to updates.

The following implementation demonstrates how to set up a basic server that broadcasts messages to all connected clients across a cluster. Notice how the server listens to a shared channel and filters messages based on its own local state. This ensures that every client receives the data exactly once, regardless of which server instance they are currently connected to.

javascriptDistributed WebSocket Server with Redis

1const WebSocket = require('ws');
2const Redis = require('ioredis');
3
4// Create two Redis clients: one for publishing and one for subscribing
5const pub = new Redis({ host: 'redis-broker', port: 6379 });
6const sub = new Redis({ host: 'redis-broker', port: 6379 });
7
8const wss = new WebSocket.Server({ port: 8080 });
9
10// Subscribe to the global message channel on startup
11sub.subscribe('global_broadcast');
12
13sub.on('message', (channel, message) => {
14    // When a message arrives from Redis, broadcast to all local clients
15    wss.clients.forEach((client) => {
16        if (client.readyState === WebSocket.OPEN) {
17            client.send(message);
18        }
19    });
20});
21
22wss.on('connection', (ws) => {
23    ws.on('message', (data) => {
24        // Instead of sending locally, publish to the Redis channel
25        // This allows other server instances to see the message
26        pub.publish('global_broadcast', data);
27    });
28});

In this example, the message flow is circular: Client to Server, Server to Redis, Redis to all Servers, and finally Servers to all Clients. This ensures that every node in the cluster is perfectly synchronized. You can extend this logic to support private messaging by creating specific Redis channels for individual user IDs or room IDs.

Managing room-based communication follows a similar pattern but requires more granular subscriptions. When a user joins a specific room, the server instance they are on must subscribe to a Redis channel dedicated to that room. This prevents every server from being flooded with messages that aren't relevant to its local pool of connected clients.

Optimizing Subscription Lifecycles

Subscription management is a critical aspect of resource efficiency in a distributed system. If your server subscribes to every possible channel globally, it will quickly succumb to CPU exhaustion as it processes irrelevant data. You should implement a dynamic subscription model where the server only listens to channels that have at least one local participant.

When the last user in a specific room disconnects from a server instance, that instance should immediately unsubscribe from the corresponding Redis channel. This cleanup logic prevents memory leaks and ensures that your message broker is not wasting bandwidth on idle connections. Efficient lifecycle management is the difference between a system that scales linearly and one that collapses under its own overhead.

Operational Challenges and Reliability

Scaling WebSockets introduces several operational complexities that aren't present in stateless APIs. One of the most common issues is the thundering herd problem, which occurs when a load balancer or server node fails. Hundreds of thousands of clients might attempt to reconnect simultaneously, creating a massive spike in CPU and memory usage that can knock down healthy nodes.

To mitigate this, you must implement exponential backoff and jitter on the client side to spread out the reconnection attempts. On the server side, you should use rate limiting to protect your resources during high-traffic events. Proper capacity planning is also essential, as WebSocket servers are memory-bound by the number of open file descriptors and the state required for each connection.

In a distributed real-time system, the network partition is your greatest enemy; always design for the moment the messaging layer becomes unreachable.

Another significant challenge is ensuring message ordering and delivery guarantees across the cluster. While Redis Pub/Sub is incredibly fast, it is a fire-and-forget system that does not store messages. If a server is momentarily disconnected from the broker, it may miss messages sent during that window, leading to out-of-sync state for the clients connected to it.

If your application requires strict message delivery, you should consider using Redis Streams or a more robust broker like Kafka. These tools provide a history of messages that clients or servers can catch up on after a disconnection. Balancing the trade-offs between speed, complexity, and reliability is a core part of the architectural decision-making process.

Handling Backpressure and Latency

Backpressure occurs when the message broker or the client is unable to keep up with the volume of incoming data. In a WebSocket context, this can lead to buffers filling up and eventually causing the server process to crash. Monitoring your outbound queue sizes and implementing drop policies for non-critical data can help maintain system stability.

Latency should be monitored at every hop of the message journey, from the client to the server and through the message bus. A delay in the synchronization layer can result in a poor user experience where participants see events at different times. Utilizing high-performance network protocols and keeping your broker close to your application servers can minimize this lag.

Monitoring and Security in a Cluster

Visibility is paramount when managing a distributed WebSocket cluster. You need to monitor metrics like the total number of active connections, message throughput per second, and the health of the Redis broker. Standard tools like Prometheus and Grafana can be used to visualize these metrics and alert you when thresholds are exceeded.

Security is another layer that becomes more complex in a distributed environment. Since connections are long-lived, traditional token-based authentication must be re-evaluated. It is best practice to authenticate the initial handshake using a short-lived token and then periodically re-verify the session to prevent unauthorized access if a user account is compromised.

javascriptSecure Handshake and Token Verification

1const url = require('url');
2const jwt = require('jsonwebtoken');
3
4wss.on('connection', (ws, req) => {
5    const parameters = url.parse(req.url, true).query;
6    const token = parameters.token;
7
8    jwt.verify(token, process.env.JWT_SECRET, (err, decoded) => {
9        if (err) {
10            // Close connection if token is invalid or expired
11            ws.terminate();
12            return;
13        }
14
15        // Store user identity on the socket object for later use
16        ws.userId = decoded.userId;
17        console.log(`User ${ws.userId} connected and authenticated`);
18    });
19});

Additionally, you must protect your messaging layer from internal abuse. Since every server instance can publish to any channel, a compromised server could theoretically inject malicious messages into the entire cluster. Implementing strict network access controls and using encrypted connections between your servers and the message broker helps mitigate these risks.

Scaling a WebSocket application is not just about adding more servers; it is about creating a robust, synchronized environment where data flows seamlessly across the entire infrastructure. By mastering the Pub/Sub pattern and addressing the operational pitfalls of stateful connections, you can build real-time systems that support millions of concurrent users with high reliability.

Graceful Shutdown and Maintenance

Performing maintenance on a WebSocket cluster requires a different approach than updating a stateless API. You cannot simply kill a server process, as doing so would abruptly disconnect thousands of active users. Instead, you should implement a graceful shutdown procedure where the server stops accepting new connections and slowly drains the existing ones.

By sending a 'maintenance' event to the clients, you can trigger them to reconnect to a different node in the cluster over a controlled period. This minimizes the impact on the user experience and prevents the thundering herd problem during deployment cycles. A well-orchestrated deployment strategy is essential for maintaining the high uptime expected of real-time applications.

Building Real-Time Dashboards with WebSocket Event Streams Securing WebSockets with WSS and JWT Authentication Strategies