As the modern cloud-based tech stack grows increasingly complex, there are a number of components of the typical software development lifecycle (SDLC) that can be particularly difficult to manage. While there are many different approaches when it comes to optimizing your company’s SDLC processes, a lot of these improvements boil down to better configuration management. Specifically, there are two areas that can benefit significantly from better configuration management tooling: testing and escalation management.
As any developer knows, there is no avoiding bugs. But if you can discover and fix your bugs before reaching production, it’s far less expensive for your organization and provides a much better experience for your customers.
To find bugs before they’re pushed to production, it’s essential to develop a well-defined test environment. This can be a local test setup, a QA system, or Staging/Pre-Prod. Unfortunately, these different testing environments often don’t perfectly match the final prod where effective tooling is essential: with a configuration management system, you can easily identify where your test environment differs from production, and avoid any resulting testing issues.
In addition, many companies struggle to keep track of configuration changes as they are deployed to the test environment. For example, if you deploy a new version of a microservice, other internal teams could be left in the dark with no clear way to determine when the change is being made or who is making it.
In some cases, teams rely on ad hoc email or word-of-mouth notification for key stakeholders, but what’s really needed is a tool that provides automated alerts via Slack or other convenient channels. This is important because the more time that passes before a QA team member is aware that changes were made, the more testing will potentially have to be reran.
Without an effective way to track these systems, it can be difficult for developers and QA teams to feel comfortable making config changes. After all, changes made to the test environment by one engineer could impact other active users in unexpected ways, leading teams to step on each other’s toes and waste valuable time and resources. It’s incredibly frustrating for everyone involved when a test that was working just fine is all of a sudden failing due to a change in environment configuration. There’s often limited recourse to determine whether something’s changed in the environment, what the change was, and how to resolve it. That’s why it’s essential to invest in a single source of truth — with an easily-accessible record of all configuration changes, where everyone involved in the testing process can monitor changes and adjust their tests accordingly.
The second key area that benefits from configuration management is escalations. When issues are escalated into your engineering team, the first step in triaging the problem is often an attempt to reproduce it. This could be delegated to a Quality Engineering team member, a developer, or in some cases, a dedicated triage team. Regardless of who’s responsible, there are a number of challenges associated with successfully recreating and resolving these escalations.
First, it can be extremely time-consuming to ensure that the engineer’s environment is configured to match the production environment where the issue was reported. All too often, the person assigned to the issue will say, “it’s working fine for me,” necessitating additional time and resources to investigate potential configuration discrepancies that could be causing the difference in observed behavior.
Furthermore, in the time between when the issue is first reported and an engineer attempts to reproduce it locally, it’s possible for changes to be rolled out to production, potentially creating additional inconsistencies. Without a single source of truth regarding key configurations, it can be very difficult to successfully reproduce and troubleshoot the issue.
Finally, it’s not uncommon for production bugs to be related to configuration changes. These issues can be challenging to debug since it’s often hard to quickly determine where the configuration changes are coming from. Plus, as more and more microservices are deployed, the level of complexity surrounding these configurations can grow rapidly.
To address the challenges of complex distributed configuration, you need tools that allow key stakeholders across your organization — including those working in both testing and escalation management — to get visibility into the current configurations and recent changes made to all relevant environments. And this isn’t a small issue. Effective configuration management with a tool like CloudTruth’s configuration intelligence platform enables you to expedite lead time (by reducing wasted testing time caused by configuration management issues), and shorten Mean Time To Restore (MTTR) (by reducing time spent debugging escalations, enabling faster discovery and deployment of fixes). The lower your MTTR, the lower the risk associated with deployment, meaning that you can deploy fixes and new features more frequently, and with greater confidence.
In the award-winning book Accelerate, the authors argue that Lead Time and MTTR are two of the top four key metrics for measuring high-performing engineering organizations. Effective configuration management has the potential to significantly improve these vital performance metrics, and as such, if your engineering organization is looking to improve its overall health, this is an essential area to invest in.
We’d love to hear your thoughts on this topic — what approaches have been successful for you and your team? Leave your thoughts in the comments below, or visit us at our website to learn more about how CloudTruth and how we can help your company manage key configurations.