GitOps

Implementing the Four Pillars of Modern GitOps

Learn how to apply declarative configuration, versioning, and automated reconciliation to build highly reliable delivery pipelines.

DevOpsIntermediate12 min read

In this article

The Evolution of Delivery: From Imperative to Declarative

The Fragility of Manual Intervention
Adopting the Declarative Mindset

The Mechanics of Automated Reconciliation

Sync Policies and Resource Pruning
The Role of the GitOps Controller

Architecting Repositories for Reliability

Handling Multi-Environment Overlays
Secret Management in Plain Sight

Operational Trade-offs and Best Practices

Observability and Troubleshooting
Scaling Beyond a Single Cluster

The Evolution of Delivery: From Imperative to Declarative

Modern software delivery demands high velocity without sacrificing system stability. Historically, engineers relied on imperative scripts that defined a specific series of steps to reach a deployment goal. These scripts are often brittle because they depend heavily on the initial environment state remaining constant between executions.

When an imperative script fails mid-way, it often leaves the infrastructure in an inconsistent state that is difficult to debug. This manual overhead creates a significant bottleneck for teams trying to scale their operations across multiple clusters. GitOps addresses these pain points by moving away from step-by-step instructions toward a state-oriented approach.

The fundamental shift in GitOps is the move to declarative configuration. Instead of telling the system how to change, you describe exactly what the system should look like in its final form. This description is stored in a version control system like Git, which acts as the single source of truth for the entire environment.

Imperative approach defines a sequence of actions like create, update, and delete commands.
Declarative approach defines the final state like the number of replicas and the specific container image version.
GitOps uses the declarative model to enable automated reconciliation between the desired and actual states.

By treating infrastructure as code, teams gain the same benefits they have enjoyed in application development for years. This includes version history, audit trails, and the ability to revert to a previous known good state with a single git revert command. It transforms the deployment process into a transparent and collaborative workflow.

The Fragility of Manual Intervention

Manual changes made directly to a production environment are the primary cause of configuration drift. When an engineer uses a command line tool to troubleshoot a live issue, they often forget to document the change in the official configuration. Over time, the difference between the documented state and the actual state grows until the documentation is effectively useless.

This drift makes automated updates unpredictable and increases the risk of downtime during routine maintenance. GitOps solves this by ensuring that any change must pass through the Git repository before it can reach the cluster. This creates a closed loop where the repository is always the definitive map of the live environment.

Adopting the Declarative Mindset

A declarative system is one where you provide a manifest of resources rather than a list of commands. The system then compares this manifest against the existing resources and calculates the necessary actions to close the gap. This abstraction allows developers to focus on the outcome rather than the underlying complexity of the infrastructure provider.

Using declarative files makes it easier to replicate environments across different stages like development, staging, and production. You can use the same manifest template and simply apply different parameters for each environment. This consistency reduces the likelihood of environment-specific bugs appearing late in the delivery cycle.

The Mechanics of Automated Reconciliation

The heart of any GitOps implementation is the reconciliation loop, a continuous process that ensures reality matches documentation. This loop is typically managed by a specialized controller sitting inside the target environment. The controller constantly polls the Git repository for new commits and compares them to the current cluster state.

If the controller detects a discrepancy, it automatically performs the necessary operations to sync the environment. This means that if a developer pushes a change to the repository, the controller updates the cluster. Conversely, if a manual change is made to the cluster, the controller will overwrite it to restore the state defined in Git.

The power of GitOps lies not in the tools themselves, but in the enforcement of a continuous feedback loop that eliminates the possibility of hidden manual changes.

This pull-based model is inherently more secure than traditional push-based CI/CD pipelines. In a push model, the CI server requires administrative credentials for the production cluster to apply changes. In the GitOps pull model, the controller inside the cluster only needs read access to the Git repository, significantly reducing the attack surface.

yamlArgoCD Application Resource

1apiVersion: argoproj.io/v1alpha1
2kind: Application
3metadata:
4  name: payment-service-prod
5  namespace: argocd
6spec:
7  project: default
8  source:
9    repoURL: https://github.com/org/infra-manifests.git
10    targetRevision: HEAD
11    path: apps/payment-service/overlays/prod
12  destination:
13    server: https://kubernetes.default.svc
14    namespace: payments
15  syncPolicy:
16    automated:
17      prune: true # Delete resources not in Git
18      selfHeal: true # Overwrite manual cluster changes

Sync Policies and Resource Pruning

Configuring sync policies is a critical step in defining how aggressive the reconciliation should be. Automated pruning is a powerful feature that deletes resources in the cluster that are no longer present in the Git repository. Without pruning, your cluster can become cluttered with orphaned resources that were manually removed from the code but forgotten in the live environment.

Self-healing is another vital policy that determines how the controller reacts to external changes. When self-healing is enabled, the controller will automatically correct any manual edits made to live resources within minutes. This ensures that the only way to make a permanent change to the system is through a reviewed and merged pull request.

The Role of the GitOps Controller

The controller acts as an intelligent agent that understands the specific nuances of the platform it manages. In a Kubernetes environment, tools like ArgoCD or Flux are common choices for this role. These controllers provide visibility into the health of the synchronization process and alert teams if a resource fails to reach the desired state.

Beyond simple application of manifests, the controller also handles the ordering of resource creation. It ensures that dependencies, such as namespaces or configuration maps, are created before the applications that rely on them. This built-in intelligence reduces the complexity of managing large-scale, multi-component systems.

Architecting Repositories for Reliability

How you structure your Git repositories has a profound impact on the scalability of your GitOps workflow. Many organizations start with a single repository for both application code and infrastructure manifests, but this often leads to tight coupling. A more robust approach involves separating the application source code from the deployment configurations.

Separating these concerns allows infrastructure teams to update deployment policies without triggering unnecessary application builds. It also enables more granular access control, as developers can have full access to the application code while the infrastructure repository requires stricter approvals. This separation clarifies the boundary between building the software and running it.

bashRecommended Directory Structure

1# Example of a structured GitOps manifest repository
2.
3├── apps/
4│   ├── billing-api/
5│   │   ├── base/           # Common resources
6│   │   └── overlays/       # Environment specific tweaks
7│   │       ├── staging/
8│   │       └── production/
9├── clusters/
10│   ├── prod-us-east/       # Cluster-wide configurations
11│   └── staging-us-west/
12└── registry/               # Catalog of available services

Using a base and overlay pattern, often facilitated by Kustomize, allows you to share common logic across environments while tailoring specific settings. For example, the base directory might contain the standard deployment manifest, while the production overlay increases the replica count and adds resource limits. This reduces duplication and ensures that core configurations remain consistent everywhere.

Handling Multi-Environment Overlays

Managing multiple environments requires a clear strategy for propagating changes from development to production. You can use separate branches for each environment, but this often leads to long-lived merge conflicts and configuration drift between branches. A better practice is to use a single main branch with different directories representing each environment.

In this directory-based approach, you promote changes by copying or templating the configuration from one folder to another. This ensures that the version of the manifest that was tested in staging is identical to the one being applied in production. Automated tools can then watch these specific paths to trigger updates in the corresponding clusters.

Secret Management in Plain Sight

Storing secrets in plain text within a Git repository is a major security risk that must be avoided. However, GitOps requires that secrets be managed alongside the resources they support. To solve this, teams use encryption tools like Sealed Secrets or external secret stores like HashiCorp Vault.

With Sealed Secrets, you encrypt your sensitive data into a custom resource that is safe to store in a public or private repository. A controller inside your cluster holds the private key and decrypts the secret only when it is needed by the application. This maintains the GitOps principle of having everything in Git while keeping the actual data secure.

Operational Trade-offs and Best Practices

While GitOps provides immense reliability, it also introduces certain constraints that teams must navigate. The most common challenge is the latency between merging a change and seeing it reflected in the cluster. This delay can be frustrating during high-pressure debugging sessions where immediate feedback is usually expected.

To mitigate this, teams can trigger manual syncs or use webhooks to notify the controller of a new commit instantly. It is also important to remember that GitOps is not a replacement for a solid testing strategy. Automated tests should still run in your CI pipeline before any manifest change is merged into the main repository.

GitOps is a safety net, not a substitute for quality engineering. Your deployment is only as reliable as the manifests you commit.

Another trade-off involves the complexity of managing the GitOps controller itself. Like any other piece of infrastructure, the controller needs to be monitored, updated, and secured. Teams must ensure they have observability into the health of the reconciliation loop to avoid silent failures where Git and the cluster drift apart without anyone noticing.

Observability and Troubleshooting

Effective GitOps adoption requires robust monitoring of the synchronization status. Controllers typically expose metrics that can be scraped by tools like Prometheus to provide alerts when a sync fails. High-level dashboards should show the current sync state of every application across the entire organization.

When a deployment fails, the Git history becomes your primary debugging tool. You can quickly see who made the change, what was modified, and the exact state of the manifest at the time of the failure. This audit trail significantly reduces the time it takes to perform a root cause analysis for infrastructure issues.

Scaling Beyond a Single Cluster

As your organization grows, you will likely manage dozens or hundreds of clusters across different regions. GitOps scales naturally to this level because you can map different repositories or directories to different cluster groups. This allows you to roll out changes incrementally across your fleet using a canary or blue-green approach.

By controlling the rollout through Git, you can pause or roll back the update globally by simply reverting a commit. This centralized control over a distributed system is one of the most compelling reasons for large enterprises to adopt GitOps. It provides a unified management plane that is independent of the underlying cloud provider.

Choosing the Right GitOps Controller: Argo CD vs. Flux