Misconfigurations are literally 100% of the reason why things go wrong.
Whether it's from a security perspective, an infrastructure perspective, or a simple accidental button click.
When you try to deploy an application and it fails, what's the reason? Misconfigurations.
Whether it's a misconfiguration in a cloud environment, in the application itself, or in the pipeline, it's always a misconfiguration. Misconfigurations slow deployment velocity.
From a security perspective, Red Hat states in the State of Kubernetes and Container Security report, 70% of security vulnerabilities were due to misconfigurations.
Most of the time, these misconfigurations happen because of a wrong button click, or someone accidentally put the wrong value into a variable or a parameter. Although mistakes happen, a lot of them can be avoided.
In this blog post, you're going to learn about the top three reasons misconfigurations ruin team morale.
Reflecting on Google's SRE handbook, there's value-driven work that an engineer does, and then there are fires that constantly need to be put out. Engineers should not have to spend more than 50% of their time putting out fires, but the reality is, that it's typically around 70-80% of their time.
Fires typically consist of fixing a production environment when it's down, fixing a bug or a code issue, and fixing a deployment. Of course, there are more than that, but these are the common scenarios.
The reality is that no engineering team can have 0 fires to put out. A certain percentage always has to go to putting out fires. However, the two ways that engineering teams can solve spending so much time putting out fires are by:
When engineers spend their time fighting fires for issues that could've been remediated, like misconfiguration, a few things happen. First, they are spending less time conducting value-driven work like product features, automation, and making environments work better. That means things like automating remediations pretty much never happen because engineers spend all of their time putting out fires.
The second biggest thing that happens, and arguably the most important, is engineers start to lose interest in what they're doing if all they do is put out fires all day. Engineers love what they do, and they want to create value-drive. work. They don't want to run around like a chicken with their head cut off 8 hours per day. They want to create things that drive value. This often leads to engineers leaving organizations.
Let's think about a common scenario when deploying on DevOps or software development teams - CICD. Whether you're deploying infrastructure, cloud services, or applications, CICD has become more or less the standard for automated deployments.
Here's the problem with CICD.
Although you can click a button to kick off a pipeline or have a pipeline kicked off when someone pushes code to the dev
or even main
branch, what happens if your configurations aren't set properly? For example, a parameter value that was misconfigured or a secret that wasn't set properly.
That means you have to comb through every place that the configuration data is stored for your CICD pipeline and for the environment that is being deployed from the CICD pipeline. This leads to putting out fires, major troubleshooting, and possibly even production environments being down. Instead, you should have one location where all of the configuration data exists, one location where you can update the configuration data, and one location where you can pull the values of the configuration data from.
All engineers that love doing what they do enjoy playing with technology, and in turn, enjoy testing out new solutions for fun. This is the bread and butter of all modernization and increase in velocity when it comes to incorporating new solutions.
However, a standard needs to be set. Lack of simple standards in distributed teams leads to config sprawl and more misconfigurations.
Especially important, there must be a standard across DevOps and software development teams.
Without a standard, every engineer is sort of just deploying as they see fit. Maybe a developer is logging into a server and fixing code on the fly. Maybe a DevOps engineer is manually updating an nginx.conf
so the frontend gets back up and running. Most of these problems that create the inability to have a standard inside of a team and across teams are due to misconfigurations.
Let's think of a common scenario - autoscaling issues. Let's say you're running an application on a Kubernetes cluster, and you have horizontal auto-scaling groups set to two worker nodes. Perhaps workloads start picking up and you need to scale out to three worker nodes. If you do that manually, what ends up happening is if the Kubernetes cluster is redeployed, or a new Kubernetes cluster is deployed, it's going to automatically go back to two worker nodes. Instead, that configuration should be set somewhere and that value should be read from the location where it's set at.
Instead of no standard being a reality, engineering teams across the organization can have one location where all configuration data, secrets, and parameters are not only stored but pulled from any system that needs integration.
The reality is that CloudTruth cannot solve every single one of these problems. There will still be fires to put out and organizations that are moving slower to automate fixes. However, what CloudTruth can solve, and solve very well, is the breaking of environments due to misconfigurations.
With CloudTruth, you don't have to worry about your configuration data being in multiple places. You don't have to worry about pulling secrets from one system and parameters/variables from another. Instead, you have one, central location for all of your configuration data.
The reason why so many misconfigurations happen is because of a two primary reasons:
Having configurations in multiple places is a fire to put out in itself.
With CloudTruth, these problems are remediated.