In order to meet the demands of our customers and clients, our teams had to learn to ship more frequently with higher quality. We share how we tackled this massive undertaking to help other teams learn to be more continuous.
5 minute read
At this point it’s no secret that continuous delivery is a central practice for high performing tech organizations with adoption only growing annually. Accelerate was published in 2018 which summarizes the findings of years of DORA research, and the results are highly compelling. If you want to produce high quality software quickly then moving towards on demand, stable, deployments potentially multiple times a day is the right choice.
Several years ago at Synapse we started down the road towards improving our delivery speed for our clients by adopting continuous delivery. As we embarked on this journey, we faced challenges in two main areas. The first was achieving speed and stability in the first place on legacy monolith systems, and the second was building and maintaining our technical and, more importantly, our socio-technical deployment infrastructure. The following explores our path to solving key challenges along the way in hopes of helping other teams who may be experiencing similar pains.
We had a couple of advantages that other organizations don’t have when it came to adapting our systems and processes. We did have a handful of legacy systems that we manage, all of which were (and still are) using a monolithic 2-tier architecture (we build react frontends for mobile and web, and node REST apis), but they’re completely independent of each other. There are no dependencies between these systems. They’re separate products. They also aren’t huge. Each legacy system is managed by a single team. As a result that team owns the entire product, including its cloud infrastructure and testing.
We used Accelerate’s research program as a rough road map that gave us areas to learn about and improve on. Accelerate, and the DORA reports, convincingly link specific technical and management practices to improved speed and stability and eventually to improved organizational performance.
As a team we committed to continuously studying domain driven design, continuous delivery, test automation, agility, and product development, and in 2021 we used a greenfield opportunity as a pilot project where we were able to architect, build and test a system without manual quality gates or deployment reviews. We were deploying on demand multiple times a day, and we felt absolutely unencumbered.
Our pilot project was delivered successfully, and ultimately handed off to another team to grow, maintain, and manage. Then we shifted gears to taking what we learned and applying it to our existing legacy systems.
For us, adoption begins with test automation to increase confidence and reduce our demand on manual quality gates. Once a team has a good understanding of how to build and grow good automated test suites then they can modernize their deployment infrastructure, and start setting up their tests as an automated gate for deployment. Teams are also encouraged to integrate more frequently, adopt TDD, and to “stop the line” when the build is red. All in an effort to improve speed and stability.
In the second quarter of 2022 we achieved true continuous delivery on our first legacy system. The team had gone from deploying once every 2-4 weeks to deploying, on average, at least once a day!
For a lot of teams delivering continuously is a brand new way of thinking and working. There are new technical systems in place that a team must own and manage, but not only that there are new ways of working and disciplines to maintain and improve. Without care our pipelines, test suites, and even the teams’ discipline will succumb to entropy and gradually decline into disorder. And, importantly, the maintenance and improvement of the delivery systems must be driven by the teams themselves in order to be successful.
Building the discipline into a team to keep their delivery systems working and improving over time is non-trivial. We built a low-noise Slack integration to broadcast the operation of our pipelines into team communication channels. The goal was that these alerts could coexist in the same channel as normal team discussions. If we were a co-located team, an alternative would have been to set up a flashing red indicator light or bright display screen in a team room that would alert the whole team when a build was in process and whether or not it had failed.
Having the whole team always aware of builds and build failures accomplishes a couple of important goals. One is that the team is alerted quickly about failures and they can “stop the line” and get it fixed, and then we can build in expectations about those failures. If a failure can’t be fixed in under 15 minutes then the offending commit should be rolled back, for example. This speeds up teams dramatically.
The second important goal is that if teams have visibility into pipeline operations, it eliminates fear of deployment. If you shine a light on the deployment process at the same time as you start deploying more frequently you can broadcast a sense of stability. If everybody, including non-technical stakeholders can see how frequently code is shipped safely, then starting to push more frequently feels safer to everyone involved. It’s a very different feeling to experience a failure when you also know that the failure came after 8 successful deployments that day.
And the final important facet of visibility is performance over time. How frequently are you deploying, and what’s your failure rate? These are part of DORA’s four key metrics, but those are high level numbers. We want to give teams tools to analyze their performance for things like mainline throughput (the time it takes on average for a commit to make it to mainline), build throughput, or build failure rate. These are metrics that are recommended by the book Measuring Continuous Delivery and they give teams the flashlight they need to keep their delivery systems in check.
So in short, we found we had to write custom tools to help with a handful of problems:
We’ve found that not only are we writing custom tooling to solve these problems, but so are other teams. As a result we’re working on making our beta platform, Blip, available to other teams looking to adopt or improve continuous delivery practices. Let us know you’re interested by joining our wait list or scheduling a demo.