Offline-First Architecture

Conflict Resolution Strategies for Distributed State

Evaluate and implement techniques like Last-Write-Wins, field-level merging, and CRDTs to reconcile concurrent updates from multiple clients.

ArchitectureIntermediate12 min read

In this article

Understanding the Distributed State Dilemma

The Limits of Server-Centric Logic
Designing for Convergence

Solving Conflicts via Deterministic Timing

Mitigating Clock Drift
Implementing a Timestamped Register

Field-Level Resolution and Metadata Tracking

Managing Nested Object States
Recursive Merge Logic

Leveraging Conflict-free Replicated Data Types

Understanding State-Based CRDTs
Implementing a G-Counter

Architectural Trade-offs and Best Practices

Choosing the Right Strategy
Testing Conflict Scenarios

Understanding the Distributed State Dilemma

In a traditional web application, the server acts as the single source of truth for all data. When a client wants to change a piece of information, it sends a request and waits for the server to validate and persist that change. This centralized model simplifies data integrity but fails completely when the network is unstable or unavailable.

Offline-first architecture flips this model by treating the local device storage as the primary data source. This allows users to remain productive regardless of their connection status, as every interaction happens against a local database. However, this shift introduces the significant challenge of reconciling divergent states once the device regains connectivity.

Conflict occurs when two or more clients modify the same record while disconnected. Because neither client is aware of the other actions, their local histories begin to drift apart. The synchronization engine must then determine how to merge these histories into a single, consistent state across the entire ecosystem.

A successful reconciliation strategy must prioritize data integrity and user intent. Simply overwriting data can lead to frustrating experiences where work is lost without warning. Therefore, engineers must choose resolution techniques based on the specific needs of the business logic and the complexity of the data structures involved.

The Limits of Server-Centric Logic

Most developers are accustomed to the concept of ACID transactions provided by relational databases. These guarantees rely on the ability to lock records or fail transactions that encounter concurrent modifications. In an offline-first world, these locking mechanisms are impossible because the clients cannot communicate in real time.

We must instead embrace the concept of eventual consistency. This means that while different nodes in the system might hold different values for a short period, they will all eventually converge on the same value. Designing for this reality requires a fundamental change in how we model our application data and state transitions.

Designing for Convergence

Convergence is the property that ensures all replicas of a dataset arrive at the same state after processing the same set of updates. To achieve this, the resolution logic must be deterministic. If two different devices perform the same merge operation, they must produce the identical result without needing further coordination.

This deterministic behavior is often achieved through metadata. By attaching version numbers, timestamps, or causal history to our data, we provide the sync engine with the context it needs to make autonomous decisions. Without this metadata, the system is essentially guessing, which leads to corrupted or inconsistent application states.

Solving Conflicts via Deterministic Timing

The most straightforward way to resolve a conflict is the Last-Write-Wins strategy. In this approach, every update is accompanied by a timestamp indicating when the change occurred. When the synchronization engine encounters two versions of the same record, it simply keeps the one with the most recent timestamp.

This method is highly efficient because it requires very little storage overhead and is easy to implement. It works well for simple use cases where the risk of concurrent edits is low or where the data is not mission-critical. Many key-value stores and caching layers use this approach by default to manage high-throughput writes.

Timestamp-based resolution is only as reliable as the clocks generating them. In distributed systems, relying on physical time without synchronization is a recipe for silent data loss.

However, Last-Write-Wins has a fatal flaw known as the lost update problem. If two users edit different fields of the same document simultaneously, the one who saves last will completely overwrite the changes made by the first user. This results in the total loss of valid information even if the changes did not technically overlap.

Mitigating Clock Drift

Physical clocks on consumer devices are notorious for being out of sync. A user might have their system clock set manually or belong to a different time zone, which can cause their updates to be ignored even if they were logically the most recent. This phenomenon is known as clock drift and can break deterministic logic.

To mitigate this, many systems use Logical Clocks or Hybrid Logical Clocks. These mechanisms combine physical timestamps with incrementing counters to ensure that even if two events happen at the same physical time, they have a clear and consistent order. This ensures that causality is preserved even when physical time is unreliable.

Implementing a Timestamped Register

To implement a basic Last-Write-Wins register, we wrap our data in a structure that includes a versioning component. This allows the merge function to compare incoming data against the existing local state effectively.

javascriptLWW Register Implementation

1class LWWRegister {
2  constructor(id, value, timestamp) {
3    this.id = id;
4    this.value = value;
5    this.timestamp = timestamp;
6  }
7
8  // Merges a remote register into the local one
9  merge(remote) {
10    // Only update if the remote timestamp is strictly greater
11    // If timestamps are equal, we use a tie-breaker like client ID
12    if (remote.timestamp > this.timestamp) {
13      this.value = remote.value;
14      this.timestamp = remote.timestamp;
15    } else if (remote.timestamp === this.timestamp && remote.id > this.id) {
16      this.value = remote.value;
17    }
18  }
19}

Field-Level Resolution and Metadata Tracking

To avoid the lost update problem associated with whole-document overwrites, we can move the resolution logic down to the field level. In this model, we treat each attribute of an object as an independent entity with its own versioning metadata. This allows the system to merge changes to different fields of the same record seamlessly.

Consider a user profile object with a name field and a bio field. If one user updates the name while another updates the bio, a field-level merge will combine both changes into a single updated profile. This approach significantly reduces the frequency of conflicts that require user intervention or result in lost data.

Increased Granularity: Allows independent updates to different parts of a document.
Reduced Conflict Frequency: Only simultaneous changes to the exact same field trigger a conflict.
Higher Storage Overhead: Each field must now track its own metadata, such as timestamps or version vectors.
Implementation Complexity: Requires a more sophisticated sync engine that can traverse and merge nested structures.

Implementing field-level merging requires the application to track the last modified time for every individual property. When a sync occurs, the engine iterates through the keys of the objects and applies the resolution logic property by property. This preserves the intent of both users as long as they did not modify the exact same attribute.

Managing Nested Object States

Nested objects add a layer of complexity to field-level merging. A naive implementation might merge top-level keys but overwrite nested objects entirely, leading back to the same problems we aimed to solve. A recursive merge strategy is necessary to ensure that deeply nested data is handled with the same level of care.

Developers must also decide how to handle deletions at the field level. Simply removing a key from a local object is not enough because the sync engine might interpret the missing key as an omission rather than a deletion. Using tombstone values or explicit deletion markers ensures that removals are propagated correctly across all devices.

Recursive Merge Logic

The following example demonstrates a basic recursive merge function. It assumes each field is stored as an object containing both the value and a timestamp.

javascriptRecursive Field-Level Merge

1function mergeDeep(local, remote) {
2  const result = { ...local };
3
4  for (const key in remote) {
5    const remoteEntry = remote[key];
6    const localEntry = local[key];
7
8    if (!localEntry || remoteEntry.timestamp > localEntry.timestamp) {
9      // If the field is an object, recurse; otherwise, take the remote value
10      if (typeof remoteEntry.value === 'object' && remoteEntry.value !== null) {
11        result[key] = {
12          value: mergeDeep(localEntry ? localEntry.value : {}, remoteEntry.value),
13          timestamp: remoteEntry.timestamp
14        };
15      } else {
16        result[key] = remoteEntry;
17      }
18    }
19  }
20  return result;
21}

Leveraging Conflict-free Replicated Data Types

Conflict-free Replicated Data Types, or CRDTs, represent the gold standard for high-concurrency offline systems. These are specialized data structures designed from the ground up to be distributed. They guarantee that no matter what order updates are received in, every replica will eventually reach the same state without any central coordination.

CRDTs work by using mathematically commutative operations. For example, in a set where you can only add items, the order of additions does not change the final composition of the set. By extending this logic to more complex types like maps, lists, and even text buffers, we can build highly collaborative applications like real-time text editors.

The primary cost of CRDTs is their significant storage and memory overhead. To guarantee convergence, CRDTs must keep track of detailed history or unique identifiers for every element ever added or removed. This metadata can eventually grow much larger than the actual data being stored, requiring periodic compaction or garbage collection strategies.

Despite the overhead, CRDTs provide a level of robustness that other methods cannot match. They eliminate the need for complex conflict resolution UI because the data structure itself handles the logic. This makes them ideal for enterprise-grade collaborative tools where data consistency is a non-negotiable requirement.

Understanding State-Based CRDTs

State-based CRDTs, also known as Convergent Replicated Data Types, synchronize by sending their entire internal state to other replicas. When a replica receives a remote state, it uses a predefined merge function to combine the remote data with its own. This approach is resilient to network issues like duplicate or out-of-order delivery.

The merge function for a state-based CRDT must be idempotent, meaning that merging the same state multiple times has no additional effect. It must also be associative and commutative. These properties ensure that the system remains stable and predictable regardless of how the network behaves during the synchronization process.

Implementing a G-Counter

A Grow-only Counter is one of the simplest CRDTs. It tracks increments from different nodes separately and sums them up to get the final total, ensuring that no increments are lost during concurrent updates.

javascriptGrow-only Counter (G-Counter)

1class GCounter {
2  constructor(nodeId) {
3    this.nodeId = nodeId;
4    this.counters = {};
5    this.counters[nodeId] = 0;
6  }
7
8  increment() {
9    this.counters[this.nodeId]++;
10  }
11
12  get value() {
13    // The total value is the sum of all known node counters
14    return Object.values(this.counters).reduce((a, b) => a + b, 0);
15  }
16
17  merge(remote) {
18    for (const [id, value] of Object.entries(remote.counters)) {
19      // Keep the maximum value seen for each node
20      this.counters[id] = Math.max(this.counters[id] || 0, value);
21    }
22  }
23}

Architectural Trade-offs and Best Practices

Choosing a conflict resolution strategy involves balancing developer effort, system performance, and the desired user experience. There is no one-size-fits-all solution; a simple task app might thrive with Last-Write-Wins, while a collaborative design tool requires the mathematical guarantees of CRDTs.

When implementing these strategies, always prioritize making the resolution logic transparent to the user when possible. If a conflict is too complex for the system to solve automatically, provide a clear interface for manual resolution. However, the goal of a good offline-first architecture is to minimize these interruptions through smart defaults.

Security is another critical consideration in decentralized resolution. Since clients are making decisions about the final state, you must ensure that malicious clients cannot inject invalid timestamps or versions to overwrite legitimate data. Validating the incoming sync payloads on the server remains a necessary layer of protection.

Finally, monitor the growth of your metadata over time. Techniques like pruning old history or snapshotting current state can prevent your local database from becoming bloated. A lean synchronization protocol is essential for maintaining the performance benefits that offline-first architecture was intended to provide.

Choosing the Right Strategy

Start by analyzing the concurrency patterns of your application. If users rarely edit the same data, a simple field-level merge with timestamps is usually sufficient. Only reach for CRDTs when you have high-frequency concurrent edits or need a history-preserving data structure.

Consider the bandwidth constraints of your target audience. CRDTs and extensive field-level metadata can significantly increase the size of synchronization payloads. If your users are on low-speed mobile networks, you may need to optimize your data structures or use delta-based synchronization to keep the app responsive.

Testing Conflict Scenarios

Testing offline-first logic is notoriously difficult because it involves simulating complex network conditions. Use automated integration tests that intentionally introduce latency, drop packets, and force concurrent writes to verify that your merge logic behaves as expected.

Simulating clock drift in your test environment is also vital. Ensure that your system handles cases where a client's clock is significantly in the past or future. This robustness ensures that your application remains reliable in the messy, unpredictable environment of real-world device usage.

Scheduling Resilient Data Sync with WorkManager Managing Optimistic UI and Error Rollbacks