When it comes to configuration management, there are many moving parts that work in unison to keep applications and infrastructure running smoothly. In a perfect world, every team at your organization would operate in lockstep, with each one knowing exactly what the other teams are up to.
But since mind-reading likely isn’t in the cards, it’s all too common for teams to end up working in silos. Disparate efforts across your organization to maintain applications through manual code changes inevitably lead to configuration drift.
Picture this — one developer claims that a particular build is running just fine in their dev environment, while a DevOps engineer claims the build is broken in an upstream environment, such as staging. This is a common scenario associated with configuration drift. And while configuration drift may not immediately result in dire consequences for your systems, it can eventually result in downtime or even total service outages.
For this reason, configuration drift is something your organization should look to minimize at all costs. After covering the basics, we’ll highlight the common culprits of configuration drift before moving on to how your organization can avoid it.
Configuration drift occurs when the current system state has deviated from the desired system state. Naturally, as your organization’s config management needs get more complex, infrastructure as code approach is needed to ensure consistency and repeatability.
For example, your DevOps team may be using Terraform to provision your infrastructure with a focus on repeatability. But the SRE team is resolving a production issue and “breaks glass” to make a config change in the AWS Console. So while your initial infrastructure was configured to your exact liking, your team is now relying on multiple systems to make changes. And when those changes aren’t captured across all systems, configuration drift starts to become an issue.
Configuration drift can pose significant risks for your organization when left unchecked. From a lackluster security posture to the inability to track changes in the event of an audit, configuration drift is something your organization needs to proactively address before it becomes a bigger issue.
As we previously mentioned, configuration drift often stems from manual changes being made to your applications or infrastructure. Within every step of the CI/CD pipeline, opportunities arise for drift to occur. While there are multiple potential causes of configuration drift, there are a few common culprits that your organization should take notice of.
Your team just encountered a major red flag with your code that can’t wait to be addressed. Leadership is asking for the issue to be resolved as soon as possible, so your engineers bypass standard procedures and perform a hotfix to quickly address the problem.
While the quick change may work in the short term, your team likely didn’t document their changes or make the same adjustments to other environments in your CI/CD pipeline. Ultimately, when a new deployment is kicked off, the hotfix will be overwritten and the initial issue will likely rear its head once again.
When your teams are pressed to make immediate code changes, configuration drift is likely to follow.
Your developers are ready to roll out some long-awaited changes to an upstream system within your infrastructure. The DevOps team is all on the same page but hasn’t thoroughly informed other impacted parties about the changes. When the changes are made, downstream systems start to break, and those teams scramble to search for answers.
Especially within highly complex integrated systems, a lack of communication can often be the root cause of configuration drift. When a change is made in one system, it will likely have ramifications on another system. And when proactive communication isn’t carried out to ensure everyone is aware of the impending changes, configuration drift is inevitable.
Early on, your organization could make and track manual changes to systems and check for errors with relative ease. But as the needs of your infrastructure become more complex and additional platforms are folded into the mix, your organization suddenly has an intricate web of systems that only a select few know the overall, end-to-end understanding of how all components are configured. The big picture is hazy to most.
Trying to manually maintain legacy systems and sophisticated cloud resources isn’t sustainable. Without automation, your organization is relying on a handful of engineers to keep your systems running smoothly. To sufficiently scale out your existing systems while also bringing in new ones, automation needs to be built into your CI/CD pipelines. Automation won’t be your silver bullet to configuration drift, but it can certainly help minimize it in the long run.
Taking a proactive approach to preventing configuration drift should be your organization’s goal. There are steps you can take to rectify configuration drift after the fact, but doing so will likely slow down build, deploy, and run times. Let’s take a closer look at some of the most effective ways in avoiding configuration drift in the first place.
Aligning your teams on where changes should be made is a critical first step to addressing configuration drift head-on. When DevOps, Ops, leadership, and engineers are all using disparate platforms to make changes, it makes it difficult to ensure those changes are taking place across the organization’s entire infrastructure.
Outline clear change management policies and procedures and communicate them to your teams to ensure everyone understands where and how changes should be made. There will be instances where a change will have to be made outside of your primary channel, but as long as you have clear standards in place around how that should occur, configuration drift shouldn’t be as much of a problem.
Before configuration drift starts to run amok at your organization, ensure all changes are being made and pushed through a single channel to minimize complexity and confusion across your teams.
If you don’t want to go the route of enforcing teams to make changes through a single platform, your other option is to allow the usage of multiple channels coupled with automated detection and change monitoring measures. On top of your automated tools, your infrastructure will require a single defined source of truth that teams can refer back to as needed.
When executed properly, your automated resources can inspect your configuration, compare it to your source of truth, and inform you when something is wrong or hasn’t been captured. This enables teams to use the systems they’re most comfortable with, while still ensuring that changes are being sufficiently tracked and checked for errors.
Last but not least, ensuring only the necessary users have access to certain systems is a small but mighty step your organization needs to take. Reducing the number of actors in any given system minimizes the likelihood of someone making erroneous changes that take time and resources to rectify.
Access control needs to be taken seriously, especially with larger teams and sophisticated systems. When everyone has access to everything, it not only makes it easier for configuration drift to occur, but it also opens the door for security breaches resulting from human error. Take the time to outline which teams require access to which systems, and implement thorough security measures to ensure current team members stay in their lane, and former team members are no longer able to access your systems.
Given the complexity of modern-day CI/CD pipelines completely avoiding configuration drift is likely a fool’s errand. With the need for hotfixes, continuous delivery, and new innovative systems, your organization should instead focus its efforts on reducing the likelihood of configuration drift. Working from a centralized platform to handle config management is a great place to start.
At CloudTruth, we enable our users to simplify and streamline changes being made to their various systems and ensure everything is traceable and everyone is on the same page. CloudTruth works within your current secrets stores, Terraform, GitHub Actions, and more, meaning our platform builds on your existing solutions, instead of replacing them. Simply put, we give you the ability to centrally manage build, deploy, and run time config data with confidence and control.
Learn more about our centralized config management solutions and see how our platform can help you minimize configuration drift and maximize productivity.