Microservices vs Monoliths

Migrating to Microservices Using the Strangler Fig Pattern

Learn the step-by-step process of incrementally extracting functionality from a legacy monolith into new services without requiring a high-risk system rewrite.

ArchitectureIntermediate12 min read

In this article

The Strategic Shift from Monoliths to Services

Evaluating the Extraction Candidate

Implementing the Intercepting Proxy

Managing Request Context

The Challenge of Data Gravity

Database Per Service Principle

Handling Cross-Service Communication

Resiliency with Retries and Timeouts

Ensuring Observability and Correctness

Measuring the Success of the Extraction

The Strategic Shift from Monoliths to Services

The decision to migrate from a monolith to microservices often stems from the need to scale development teams rather than just scaling software performance. As a single codebase grows, the cognitive load required to understand every interconnected module becomes a bottleneck for new engineers. This results in slower release cycles and a heightened fear of breaking disparate parts of the system with every deployment.

A common mistake is attempting a complete system rewrite from scratch. This approach frequently leads to the second-system effect where the project exceeds its timeline and fails to deliver value while the original application continues to accumulate technical debt. Instead of a total overhaul, modern architecture favors an incremental extraction process that maintains system stability.

This incremental approach is often visualized through the Strangler Fig pattern. In nature, this plant begins its life in the canopy of a host tree and slowly grows its roots down to the ground. Over time, the new plant completely envelops and replaces the original structure. In software, we use an API gateway to gradually route traffic from legacy paths to new service endpoints.

The greatest risk in software architecture is not the choice of technology but the attempt to change everything at once without a safety net.

Evaluating the Extraction Candidate

Selecting the right functionality to extract first is critical for building momentum. You should avoid core business logic that is deeply entwined with multiple database tables in the initial phase. Choosing a peripheral but high-traffic module allows the team to establish deployment pipelines and networking infrastructure without risking the entire business.

Low domain complexity to minimize logic errors during the move.
Independent data requirements that do not require complex joins with other tables.
High change frequency where the team benefits most from faster deployment cycles.
Clear performance bottlenecks that would benefit from specialized infrastructure scaling.

Implementing the Intercepting Proxy

Once a candidate module is identified, the next step involves introducing a routing layer between the client and the monolithic backend. This layer acts as a traffic controller that decides which requests should go to the legacy system and which should be directed to the new service. It provides a single entry point for consumers so they remain unaware of the underlying architectural shifts.

Using a reverse proxy or an API gateway allows you to toggle traffic based on specific URI paths or headers. This setup facilitates canary releases where you can send a small percentage of traffic to the new service to monitor for errors. If the new service fails, the proxy can instantly reroute traffic back to the stable monolith, ensuring high availability.

nginxNginx Routing for Incremental Migration

1http {
2    upstream legacy_monolith {
3        server 10.0.0.1:8080;
4    }
5
6    upstream notification_service {
7        server 10.0.0.2:5000;
8    }
9
10    server {
11        listen 80;
12
13        # Default traffic goes to the monolith
14        location / {
15            proxy_pass http://legacy_monolith;
16        }
17
18        # Targeted traffic for the extracted service
19        location /api/v1/notifications {
20            proxy_pass http://notification_service;
21            proxy_set_header X-Routing-Source "strangler-proxy";
22        }
23    }
24}

Managing Request Context

As you split the monolith, maintaining user context across network boundaries becomes a primary challenge. The monolith likely relied on local session memory or a direct database connection to verify user identities and permissions. In a distributed environment, the gateway must handle the translation of session cookies into standard tokens like JSON Web Tokens.

The new service should receive all the information it needs to process the request without calling back into the monolith. This requires a shift in thinking where the request header contains a verified identity and a set of claims. This decoupling ensures that the new service remains truly independent and performant.

The Challenge of Data Gravity

Moving code is relatively straightforward, but moving data is the most difficult part of any microservices migration. Data gravity describes the phenomenon where data and its associated applications are drawn together. If a new service continues to query the monolithic database, you have created a distributed monolith rather than an independent service.

A distributed monolith inherits the worst of both worlds: the deployment complexity of microservices and the tight coupling of a monolith. Changes to the database schema in the legacy system can still break the new service. To achieve true independence, the extracted service must eventually own its private data store.

One effective strategy for data migration is the dual-write approach. During the transition, the application writes data to both the old database and the new database. This allows the team to compare the data for consistency before the legacy table is eventually decommissioned.

pythonDual-Write Implementation with Synchronous Verification

1def create_user_profile(user_data):
2    # First, write to the legacy system to ensure backward compatibility
3    legacy_success = legacy_db.insert_user(user_data)
4
5    if legacy_success:
6        try:
7            # Attempt to write to the new microservice datastore
8            new_service_db.insert_user(user_data)
9        except Exception as e:
10            # Log the error but don't fail the request yet
11            logger.error(f"Data sync failed for new service: {e}")
12
13    return legacy_success

Database Per Service Principle

Adhering to the database-per-service principle is non-negotiable for long-term scalability. This ensures that no other service can bypass the API and access the data directly. When a service owns its data, it can choose the storage technology that best fits its specific access patterns, such as a graph database for relationships or a document store for flexible schemas.

This isolation also simplifies the testing process. Developers can spin up a local instance of the service and its specific database without needing to replicate the entire monolithic schema. It reduces the blast radius of database maintenance and allows for independent scaling of storage resources based on the service needs.

Handling Cross-Service Communication

When logic is moved out of the monolith, functions that were previously local procedure calls become network calls. This introduces latency and the possibility of partial failure. Engineers must account for scenarios where the network is slow or the remote service is temporarily unavailable.

The use of synchronous REST or gRPC calls is common but can lead to cascading failures if not managed properly. Implementing the Circuit Breaker pattern is essential in these scenarios. A circuit breaker monitors for failures and trips if a service becomes unresponsive, preventing the calling application from wasting resources on doomed requests.

Asynchronous communication using message brokers like RabbitMQ or Kafka often provides a more resilient alternative. By emitting events when state changes, the monolith can notify other services without needing to know their specific location or status. This pattern promotes loose coupling and allows services to process data at their own pace.

Resiliency with Retries and Timeouts

Every network request should have a strictly defined timeout to prevent threads from hanging indefinitely. Without timeouts, a slow dependency can cause a backup that eventually consumes all available connections in the calling service. This leads to a total system outage caused by a single underperforming component.

Retrying failed requests can resolve transient network issues, but it should be done with exponential backoff. If every service retries immediately and frequently, it can lead to a self-inflicted denial-of-service attack on the recovering system. Adding jitter to the backoff helps distribute the load more evenly over time.

Ensuring Observability and Correctness

In a monolithic architecture, a single stack trace usually reveals the root cause of an error. In a distributed system, an error might originate in one service but manifest in another. To debug these issues effectively, you must implement distributed tracing using a correlation ID that follows a request across all service boundaries.

Centralized logging becomes a requirement rather than a luxury once you have multiple services. Searching through logs on individual servers is no longer feasible. Instead, logs should be pushed to a central aggregator where they can be filtered and analyzed based on the shared correlation ID.

Finally, you must implement health checks that provide a realistic view of the service status. A simple endpoint that returns a successful status code is often insufficient. A robust health check should verify that the service can actually connect to its database and any other critical downstream dependencies.

Measuring the Success of the Extraction

The ultimate goal of extracting a service is to improve the developer experience and system performance. You should track metrics such as lead time for changes and deployment frequency for the new service. If these metrics do not improve compared to the monolith, the boundaries of the service may need to be re-evaluated.

Monitor the error rates and latency specifically for the paths routed through the API gateway. This data provides the confidence needed to eventually shut down the legacy code path. Once the new service consistently handles one hundred percent of the traffic with stable performance, the old monolithic code can be safely deleted.

Managing Network Latency and Consistency in Microservices Evaluating Operational Readiness for Distributed Architectures