Welcome to my fourth and final post in this blog post series on introducing DevOps in a traditional enterprise.
In my previous post I explained how the implementation of a configuration management system allowed us to get a better visibility on the different versions of the components that were built and on the new features they contained. In this post I will take it one step further and explain how we improved our release management efforts and more specifically why we needed orchestrated releases, how they compare to continuous deployments and how we selected and implemented a tool to facilitate us with these releases.
A need for orchestration
As a small reminder from my earlier posts, the applications in our company were highly integrated with each other and therefore most of the new features required modifications in multiple applications, each of which managed by its own dedicated development team. Simple example: an application needs to display additional information about a financial instrument. This would not only require a modification to the application itself but also to the central data hub and to the application that receives this information from the outside and acts as the owner of this information.
This seemingly innocent fact had a rather big consequence: it resulted in the need for a release process that is capable of deploying the modified components of each of these impacted applications in the same go, in other words an orchestrated release process. If the component would be deployed on different days it would break or at least disrupt the applications.
In fact, even if a feature can be implemented by only modifying components within one application at least some degree of orchestration is necessary (a good example is a modification in the source code and one in the database schema). But as this would stay within the boundaries of one development team it would be relatively easy to organize that all components are deployed simultaneously.
Things become way more complicated when more than one application and development team is involved in implementing a feature and moreover when multiple of these application-overlapping features are being worked on in parallel and thereby modifying the same components. Such a situation really shouts for a company-wide release calendar that defines the exact dates when components can be deployed in each environment. If we deploy all components at the same moment we are sure that all the dependencies between them are taken into account.
Unfortunately we all know that creating strict deadlines in a context so complex and so difficult to estimate as software development can cause a lot of stress. And cause a lot of missed deadlines as well, resulting in half-finished features that must somehow be taken out of the ongoing release and postponed to the next release. And a lot of the dependent features that must be taken out as well. Or features that can stay in but were rushed so much to get them finished before the deadline that they lack the quality that is expected from them, causing system outages, urgent bug fixes that must be released short after the initial release, loss of reputation, and so on downstream in the process. And even if all goes according to the plan, there is generally a lot of unfinished work waiting somewhere in the pipeline for one or another deployment (development-done but not release-done, also called WIP or work in progress).
What about continuous deployment?
Continuous deployment takes another approach to avoid all these problems that are caused by these dependencies: it requires the developers to support backward-compatibility, in other words their component should work with the original versions as well as with the modified versions (those that include the new feature) of the other components, simply because they don’t know in advance in which order the components will be deployed. In such a context the developers can do their work on their own pace and deliver their components whenever they are ready. As soon as all the new versions of the components are released a feature flag can be switched on to activate the feature.
There is unavoidably a huge drawback to this scenario: in order to support this backward compatibility the developers basically have to keep two versions of the logic in his code, and this applies to each individual feature that is being worked on. This requirement can be a big deal, especially if the code is not well-organized. Once the feature is activated they should also not forget to remove the old version to avoid that the code base becomes a total mess after a while. If there are changes to the database scheme (or other stateful resources) and/or data migrations involved things will become even more complicated. Continuous deployment is also tightly coupled to hot deployment, which also introduces quite some challenges on its own and if the company doesn’t have a business need to be up all the time that’s a bit of a wasted effort. I found a nice explanation of all the intricacies of such practices in this webinar by Paul Biggar at MountainWest RubyConf 2013.
But don’t get me wrong on this, continuous deployment is great and I really see it as the way to go but that doesn’t take away that it will take a long time for the “conventional” development community to switch to a mindset that is so extremely different from what everyone has been used to for so long. For this community (the 99%-ers ;-)) continuous deployment merely appears as a small dot far away on the horizon, practiced by whizkids and Einsteins living on a different planet. But hopefully, if we can gradually bring our conventional software delivery process under control, automating where possible and gradually increasing the release frequency, maybe one day the leap towards continuous deployment may be less daunting than it is today and instead just be the next incremental step in the process.
But until then I’m afraid we are stuck with our orchestrated release process so we better make sure that we bring the processes and dataflows under control so we can keep all the aforementioned problems it brings with it to a minimum.
Some words about the process of release coordination
Let us have a closer look at the orchestrated release process in our company, starting from the moment the components are delivered to the software repository (typically done by a continuous integration tool) to the moment they are released to production.
The first step was for the dev team to create a deployment request for their application. This request should contain all the components – including the correct version numbers – that implement the features that were planned for the ongoing release.
Each team would then send their deployment request to the release coordinator (a role on enterprise level) for the deployment of their application in the first “orchestrated” environment, in our case the UAT environment. Note that we also had an integration environment between development and UAT where cross-application testing – amongst other types of testing – happened but this environment was still in the hands of the development teams in terms of deciding when to install their components.
The release coordinator would review the contents of the deployment requests and verify that the associated features were signed-off in the previous testing stage, e.g. system testing. Finally he would assemble all deployment requests into a global release plan and include some global pre-steps (like taking a backup of all databases and bringing all services down) and post-steps (like restarting all services and adding an end-to-end smoke-test task). On the planned day of the UAT deployment, he would coordinate the deployments of the requests with the assigned ops teams and inform the dev teams as soon as the UAT environment was back up and running.
When a bug was found during the UAT testing, the developer would fix it and send in an updated deployment request, one where the fixed components got an update version number and were highlighted for redeployment.
The deployment request for production would then simply be a merge of the initial deployment request and all “bugfix” deployment requests, each time keeping the last deployed version of a component. Except if it’s a stateful components like a database, in which case deployments are typically incremental and as a result all correction deployments that happened in UAT must be replayed in production. Stateless components like the ones that are built from source code are typically deployed by completely overwriting the previous version.
Again, the release coordinator would review and coordinate the deployment requests for production, similar to how it was done for UAT and that would finally deliver the features into production, after a long a stressful period.
Note that I only touched the positive path here. The complete process was a lot more complex and included scenarios like what to do in case a deployment failed, how to treat rollbacks, how to support hotfix releases that happen while a regular release is ongoing, etc.
For more information on this topic you should definitely check out Eric Minick’s webinar on introducing uRelease in which he does a great job explaining some of the common release orchestration pattern that exist in traditional enterprises.
As long as there were relatively few dev and ops teams and all were mostly co-located this process could still be managed by a combination of Excel, Word, e-mail, and a lot of plain simple human communication. However, as the IT department grew over time and became more spread out over different locations, this “artisanal” approach hit its limits and a more “industrial” solution was needed.
A big hole in the market
Initially we thought about developing a small release management tool in-house to solve these scaling problems but as soon as we sat down to gather the high-level requirements we realized that it would take our small team too much time and effort to develop it and to maintain it later on.
So we decided to go on the market and look for a commercial tool, only to be surprised about the lack of choice there was. Yes, you have the ITIL-based ITSM tools like BMC Remedy and ServiceNow but these tools only touch the surface of release management. At the other side of the spectrum you have the CI or push button triggered automated deployment tools that focus on the technical aspects but have little or no notion of orchestrated releases, let alone all of the requirements I mentioned above. Nonetheless, plenty of choice here, depending on your specific needs: XebiaLabs Deployit, ZeroTurnaround LiveRebel, Nolio, Ansible, glu, Octopus Deploy (.NET only), Rundeck, Thoughtworks Go, Inedo BuidMaster (the last two include their own CI server).
Instead, what we needed was a tool that would map closely to the existing manual processes and that could simply take over the administrative, boring, error prone and repetitive work of the release coordinator and provide an integrated view of the situation – linking together features, components, version numbers, releases, deployments, bug reports, environments, servers, etc – rather than a tool that would completely replace the human release coordination work in one step.
We were therefore very happy to find SmartRelease, a tool that nicely fit this gap. It was created by Streamstep, a small US-based start-up that was co-founded by Clyde Logue, a former release manager and a great person I had the opportunity to work with throughout the implementation of the tool. During this period the company was sold to BMC and renamed to Release Process Manager (RPM). But other than this tool nothing else came close to our requirements, something I found quite surprising considering the struggles that companies with many integrated applications typically have with their releases.
This was about three years ago. Looking at the market today, I see that some interesting tools have been introduced (or reached maturity) since then. Not only has SmartRelease/RPM been steadily extended with new features, also UrbanCode has come up with a new tool uRelease, Serena has pushed its Release Manager and UC4 has Release Orchestrator. Although none of them comes close yet to the ideal tool I have in mind (and I’m especially thinking about managing the dependencies between the features and components here) I don’t think it will take too long before one of them will close the gap.
On the other hand, where I do see a big hole in the market today is on the low-budget do-it-yourself side. Looking at how these four companies present their tools, I noticed that none of them make a trial version available that can be downloaded and installed without too much ceremony. This leads me to believe that they are mainly focusing on the “high-profile” companies where the general sales strategy is to send out a battery of sales people, account managers and technical consultant to try to seduce higher management for the need of their tools so they can send another battery of consultants to come in and do the implementation. If this manager is well aware of the situation on the work floor this may actually be a successful strategy.
For the devs or ops people from the work floor, the ability to play around with a tool, to set it up for a first project and to see if it really brings any value to the company somewhere in a dark corner outside of the spotlights is invaluable. And once they decide to make their move and ask for the budget they already have some real facts to prove their business case. I believe that there is a big opening here for tools that are similar to what Jenkins and TeamCity are to continuous integration: being easily accessible, creating a community around them and as such making the concept – in this case release coordination – mainstream.
Putting the tool to work
Back to our chosen tool Release Process Manager, let us have a look at how exactly it helped us industrialize our release management process.
First of all, it allows the release manager (typically a more senior role than the release coordinator) to configure her releases, and this includes specifying the applicable deployment dates for each environment. It also allows the developers to create a deployment request for a particular application and release which contains a set of deployment steps, one for each component that must be deployed. These steps can be configured to run sequentially or in parallel and manual (as in our case, thanks for not allowing access to the production servers Security team ;-)) or in automated fashion.
The deployment date the deployment requests for a particular release can then be grouped into a release plan – typically after all deployment requests are received – that allows to create release-specific handling like adding pre and post deployment steps.
And finally during the day of the deployment the release plan will be executed, going over each deployment step of each deployment request step either sequentially or in parallel. For each manual step, the ops team responsible for the deployment of the associated component will receive a notification and will be able to indicate success or failure. For each automated step an associated script is run that will take care of the deployment.
See here a mockup of how a deployment request looks like:
It was notified by the change management tool whenever a change request (or feature) was updated. This allowed the tool to show the change requests that applied to that particular application and release directly on the deployment request which made it possible for the release coordinator to easily track the the statuses of the change requests and e.g. reject the deployment requests that contain not yet signed-off change requests.
It was notified by the software repository whenever a new version of a component was built which allowed the tool to restrict the choices of the components and their version number to only those that actually exist in the software repository.
See here an overview of the integration of the release management tool with the other tools:
More in general, by implementing a tool for release management rather than relying on manual efforts it became possible to increase the quality of the information – this was done by either enforcing that correct input data is introduced or by validating it a posteriori through reporting – and to provide a better visibility on the progress of the releases.
As an added bonus all actions are logged which allows for more informed post-mortem meetings etc. It also opens the way to create any kind of reporting on statuses, metrics, trends, KPI’s etc. that may become necessary in the future.
In a later stage a third integration was created, from the configuration management tool to the release management tool this time, which allowed the configuration management tool to retrieve the versions of the deployed components in each environment and include it in the view of the component (see my previous post for more info on this topic).
Another feature that I found very interesting but which was not yet supported by the release management tool was the possibility to automatically create of a first draft of the deployment request from the configuration management tool. I can easily image the developer opening up the configuration management view of his application, selecting the versions of the components he wants to include in his deployment request and pressing the button “Generate deployment request”. And if the latest versions that include change requests for the next release are selected by default most of the time the effort could be reduced to just pressing the button.
To conclude this blog post series, let us now take a step back and see which problems that were identified initially (more details in this earlier post) got solved by implementing a configuration management tool and a release management tool:
- The exponentially increasing need for release coordination, which was characterized by it’s manual nature, and the pressure it exerted on the deployment window => SOLVED
- The inconsistent, vague, incomplete and/or erroneous deployment instructions => SOLVED
- The problems surrounding configuration management: not being enough under control and allowing for a too permissive structure => SOLVED
- The manual nature of testing happening at the end of a long process meaning that it must absorb all upstream planning issues – this eventually causes the removal of any not signed-off change requests => NOT SOLVED YET
- The desire by the developers to sneak in late features into the release and thereby bypassing the validations => SOLVED
Looks good doesn’t it? Of course this doesn’t mean that what was implemented is perfect and doesn’t need further improvement. But for a first step it solved quite a number of urgent and important problems. It is time now for these tools to settle down and to put them under the scrutiny of continuous improvement before heading to the next level.
Wow, it has taken me a lot more time than I initially estimated to write down my experiences in a blog post series but I’m happy that I have finally done it. Ever since publishing my first post in the series I have received lots of feedback from people with all different kinds of backgrounds, heard a lot of similar stories also and got many new insights on the way.
I hope you have enjoyed reading the blog post series, hopefully as much as I have enjoyed writing it
Previous posts in the series:
- Second post: A closer look at introducing DevOps in a traditional enterprise – explaining the team structure and the problems that existed
- Third post: First steps in DevOps-ifying a traditional enterprise – taking control of configuration management