API Idempotency

Managing Race Conditions in Distributed Idempotent Systems

Explore strategies for handling concurrent requests with duplicate keys using distributed locking in Redis or SQL. Ensure your system doesn't process the same transaction twice when retries happen simultaneously.

Backend & APIsIntermediate12 min read

In this article

The Concurrency Challenge in Distributed Systems

The Check-Then-Act Anti-Pattern

Distributed Locking with Redis

Implementing the Atomic Lock Pattern

Leveraging SQL for Transactional Idempotency

Using Unique Constraints and Upserts

Handling Failure and Edge Cases

Strategy Comparison and Trade-offs

The Concurrency Challenge in Distributed Systems

In a distributed environment, the primary driver for idempotency is the uncertainty of network communication. When a client sends a request and does not receive a response, it cannot distinguish between a dropped request, a server crash, or a delayed acknowledgment. To ensure the intended action occurs exactly once, the client must retry the operation using a unique identifier known as an idempotency key.

A common pitfall occurs when two identical requests arrive at the server almost simultaneously. This happens when a client has a short retry timeout or when a mobile user clicks a submit button multiple times in rapid succession. Without a robust strategy for handling concurrency, your system might begin processing both requests in parallel, leading to race conditions that bypass simple validation checks.

Idempotency is not just about ignoring duplicate keys; it is about ensuring that concurrent attempts to process the same key do not result in corrupted state or duplicate side effects.

The Check-Then-Act Anti-Pattern

Many developers attempt to solve idempotency by first checking a database for the existence of a key and then proceeding with the business logic. This check-then-act sequence is inherently unsafe in high-concurrency scenarios because two separate threads can both verify that the key is missing before either has a chance to record it. This window of vulnerability allows both threads to enter the critical section of your code.

To solve this, we must treat the identification and the start of processing as a single atomic operation. By utilizing distributed locks or atomic database constraints, we can ensure that only one worker wins the right to process a specific idempotency key at any given time.

Distributed Locking with Redis

Redis is an ideal tool for managing idempotency in stateless microservices due to its high performance and support for atomic operations. The most effective pattern involves using a distributed lock to act as a mutex for a specific idempotency key. This ensures that while one instance is processing a transaction, all other instances are blocked or redirected to a cached result.

The core of this strategy is the atomic set-if-not-exists operation which allows a worker to claim a lock and set an expiration time in one step. This prevents the system from entering a deadlock state if a worker crashes before it can release the lock. Modern Redis clients provide high-level abstractions for this, but the underlying command remains the foundation of safe concurrency control.

Implementing the Atomic Lock Pattern

The following implementation demonstrates how to use a Redis-based lock to guard a critical payment processing function. By using a unique value for the lock, we ensure that only the owner of the lock can release it, preventing one request from accidentally clearing the lock held by another.

pythonRedis-Based Concurrency Control

1import redis
2import uuid
3
4# Configure the Redis connection
5cache = redis.StrictRedis(host='localhost', port=6379, db=0)
6
7def process_secure_payment(idempotency_key, amount):
8    lock_name = f"lock:payment:{idempotency_key}"
9    request_id = str(uuid.uuid4())
10    
11    # Try to acquire lock for 30 seconds atomically
12    # NX=True ensures we only set if it doesn't exist
13    if cache.set(lock_name, request_id, nx=True, ex=30):
14        try:
15            # Perform the sensitive business logic here
16            print(f"Processing payment for {idempotency_key}")
17            return {"status": "success", "amount": amount}
18        finally:
19            # Release only if we are the current owner
20            if cache.get(lock_name) == request_id:
21                cache.delete(lock_name)
22    else:
23        # Conflict detected: request is already being processed
24        return {"status": "error", "message": "Request in progress"}

In this scenario, if a second request arrives while the first is still active, the second request will fail to acquire the lock. It can then either wait and retry or return a specific status code to the client indicating that the operation is currently being handled. This prevents the underlying payment gateway from being called twice for the same transaction ID.

Leveraging SQL for Transactional Idempotency

While Redis is excellent for speed, SQL databases offer stronger consistency guarantees that are often preferable for financial transactions. Relational databases allow you to leverage unique constraints and row-level locking to enforce idempotency at the storage layer. This approach ensures that even if the application logic has bugs, the database will refuse to create duplicate records.

The most reliable SQL pattern involves a dedicated idempotency table that stores the key, the current status of the request, and the final response body. This table serves as a central audit log for all incoming intents and prevents the system from ever processing the same intent twice. By wrapping the check and the insert in a single database transaction, you achieve true atomicity.

Using Unique Constraints and Upserts

Instead of manually locking rows, you can use the database's natural ability to enforce uniqueness via primary keys or unique indexes. An insert attempt that violates a unique constraint will immediately throw an error, which the application can catch and interpret as a duplicate request. This is often more performant than explicit locking because it reduces the time rows spend in a locked state.

sqlPostgreSQL Idempotency Schema

1CREATE TABLE processed_requests (
2    idempotency_key UUID PRIMARY KEY,
3    status VARCHAR(20) NOT NULL,
4    response_json JSONB,
5    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
6);
7
8-- Atomic attempt to register a new request
9INSERT INTO processed_requests (idempotency_key, status)
10VALUES ('550e8400-e29b-41d4-a716-446655440000', 'started')
11ON CONFLICT (idempotency_key) DO NOTHING;

After the initial insert, your application should check how many rows were affected to determine if it should proceed with the logic or retrieve the existing result. If the insert was successful, the worker can continue with the business operation and update the status to finished upon completion. If the insert failed, the worker should query the existing row to see if the previous attempt succeeded or is still in progress.

Handling Failure and Edge Cases

Building a resilient idempotency system requires planning for scenarios where a process starts but never finishes. If a worker crashes after acquiring a lock or inserting a 'started' record, subsequent retries might be blocked indefinitely. You must implement a strategy to detect and recover from these stalled transactions without risking duplicate execution.

TTL mechanisms are the standard solution for lock expiration, but they must be tuned carefully based on the expected duration of the task. If a lock expires too quickly, a second worker might start the task while the first is still running, defeating the purpose of the lock. Conversely, if it lasts too long, failures will cause significant delays for clients attempting to retry legitimate requests.

Strategy Comparison and Trade-offs

Choosing between Redis and SQL for idempotency depends on your system's existing architecture and the criticality of the data. Redis provides lower latency and is generally easier to scale horizontally, while SQL provides the ACID guarantees necessary for high-stakes financial operations. Many high-scale systems use a hybrid approach where Redis handles the initial locking and SQL persists the final results.

Redis Locks: Best for high-traffic, low-latency requirements where an occasional lock loss during a cluster failover is acceptable.
SQL Unique Constraints: Best for maximum data integrity where every transaction must be accounted for and audited.
Fencing Tokens: Essential for preventing late-arriving requests from overwriting newer state in complex distributed workflows.

Regardless of the chosen storage, always ensure that your API returns the exact same response for a given idempotency key. This includes the HTTP status code, headers, and response body. Consistency in reporting ensures that the client's internal state stays synchronized with the server even after multiple retries.

Implementing Idempotency Keys for Safe API Retries All API Idempotency Articles