Welcome to my fourth and final post in this blog post series on introducing DevOps in a traditional enterprise.
In my previous post I explained how the implementation of a configuration management system allowed us to get a better visibility on the different versions of the components that were built and on the new features they contained. In this post I will take it one step further and explain how we improved our release management efforts and more specifically why we needed orchestrated releases, how they compare to continuous deployments and how we selected and implemented a tool to facilitate us with these releases.
(For the sake of completeness, here are the links to the introduction and an analysis of the problems)
A need for orchestration
As a small reminder from my earlier posts, the applications in our company were highly integrated with each other and therefore most of the new features required modifications in multiple applications, each of which managed by its own dedicated development team. Simple example: an application needs to display additional information about a financial instrument. This would not only require a modification to the application itself but also to the central data hub and to the application that receives this information from the outside and acts as the owner of this information.
This seemingly innocent fact had a rather big consequence: it resulted in the need for a release process that is capable of deploying the modified components of each of these impacted applications in the same go, in other words an orchestrated release process. If the component would be deployed on different days it would break or at least disrupt the applications.
In fact, even if a feature can be implemented by only modifying components within one application at least some degree of orchestration is necessary (a good example is a modification in the source code and one in the database schema). But as this would stay within the boundaries of one development team it would be relatively easy to organize that all components are deployed simultaneously.
Things become way more complicated when more than one application and development team is involved in implementing a feature and moreover when multiple of these application-overlapping features are being worked on in parallel and thereby modifying the same components. Such a situation really shouts for a company-wide release calendar that defines the exact dates when components can be deployed in each environment. If we deploy all components at the same moment we are sure that all the dependencies between them are taken into account.
Unfortunately we all know that creating strict deadlines in a context so complex and so difficult to estimate as software development can cause a lot of stress. And cause a lot of missed deadlines as well, resulting in half-finished features that must somehow be taken out of the ongoing release and postponed to the next release. And a lot of the dependent features that must be taken out as well. Or features that can stay in but were rushed so much to get them finished before the deadline that they lack the quality that is expected from them, causing system outages, urgent bug fixes that must be released short after the initial release, loss of reputation, and so on downstream in the process. And even if all goes according to the plan, there is generally a lot of unfinished work waiting somewhere in the pipeline for one or another deployment (development-done but not release-done, also called WIP or work in progress).
What about continuous deployment?
Continuous deployment takes another approach to avoid all these problems that are caused by these dependencies: it requires the developers to support backward-compatibility, in other words their component should work with the original versions as well as with the modified versions (those that include the new feature) of the other components, simply because they don’t know in advance in which order the components will be deployed. In such a context the developers can do their work on their own pace and deliver their components whenever they are ready. As soon as all the new versions of the components are released a feature flag can be switched on to activate the feature.
There is unavoidably a huge drawback to this scenario: in order to support this backward compatibility the developers basically have to keep two versions of the logic in his code, and this applies to each individual feature that is being worked on. This requirement can be a big deal, especially if the code is not well-organized. Once the feature is activated they should also not forget to remove the old version to avoid that the code base becomes a total mess after a while. If there are changes to the database scheme (or other stateful resources) and/or data migrations involved things will become even more complicated. Continuous deployment is also tightly coupled to hot deployment, which also introduces quite some challenges on its own and if the company doesn’t have a business need to be up all the time that’s a bit of a wasted effort. I found a nice explanation of all the intricacies of such practices in this webinar by Paul Biggar at MountainWest RubyConf 2013.
But don’t get me wrong on this, continuous deployment is great and I really see it as the way to go but that doesn’t take away that it will take a long time for the “conventional” development community to switch to a mindset that is so extremely different from what everyone has been used to for so long. For this community (the 99%-ers ;-)) continuous deployment merely appears as a small dot far away on the horizon, practiced by whizkids and Einsteins living on a different planet. But hopefully, if we can gradually bring our conventional software delivery process under control, automating where possible and gradually increasing the release frequency, maybe one day the leap towards continuous deployment may be less daunting than it is today and instead just be the next incremental step in the process.
But until then I’m afraid we are stuck with our orchestrated release process so we better make sure that we bring the processes and dataflows under control so we can keep all the aforementioned problems it brings with it to a minimum.
Some words about the process of release coordination
Let us have a closer look at the orchestrated release process in our company, starting from the moment the components are delivered to the software repository (typically done by a continuous integration tool) to the moment they are released to production.
The first step was for the dev team to create a deployment request for their application. This request should contain all the components – including the correct version numbers – that implement the features that were planned for the ongoing release.
Each team would then send their deployment request to the release coordinator (a role on enterprise level) for the deployment of their application in the first “orchestrated” environment, in our case the UAT environment. Note that we also had an integration environment between development and UAT where cross-application testing – amongst other types of testing – happened but this environment was still in the hands of the development teams in terms of deciding when to install their components.
The release coordinator would review the contents of the deployment requests and verify that the associated features were signed-off in the previous testing stage, e.g. system testing. Finally he would assemble all deployment requests into a global release plan and include some global pre-steps (like taking a backup of all databases and bringing all services down) and post-steps (like restarting all services and adding an end-to-end smoke-test task). On the planned day of the UAT deployment, he would coordinate the deployments of the requests with the assigned ops teams and inform the dev teams as soon as the UAT environment was back up and running.
When a bug was found during the UAT testing, the developer would fix it and send in an updated deployment request, one where the fixed components got an update version number and were highlighted for redeployment.
The deployment request for production would then simply be a merge of the initial deployment request and all “bugfix” deployment requests, each time keeping the last deployed version of a component. Except if it’s a stateful components like a database, in which case deployments are typically incremental and as a result all correction deployments that happened in UAT must be replayed in production. Stateless components like the ones that are built from source code are typically deployed by completely overwriting the previous version.
Again, the release coordinator would review and coordinate the deployment requests for production, similar to how it was done for UAT and that would finally deliver the features into production, after a long a stressful period.
Note that I only touched the positive path here. The complete process was a lot more complex and included scenarios like what to do in case a deployment failed, how to treat rollbacks, how to support hotfix releases that happen while a regular release is ongoing, etc.
For more information on this topic you should definitely check out Eric Minick’s webinar on introducing uRelease in which he does a great job explaining some of the common release orchestration pattern that exist in traditional enterprises.
As long as there were relatively few dev and ops teams and all were mostly co-located this process could still be managed by a combination of Excel, Word, e-mail, and a lot of plain simple human communication. However, as the IT department grew over time and became more spread out over different locations, this “artisanal” approach hit its limits and a more “industrial” solution was needed.
A big hole in the market
Initially we thought about developing a small release management tool in-house to solve these scaling problems but as soon as we sat down to gather the high-level requirements we realized that it would take our small team too much time and effort to develop it and to maintain it later on.
So we decided to go on the market and look for a commercial tool, only to be surprised about the lack of choice there was. Yes, you have the ITIL-based ITSM tools like BMC Remedy and ServiceNow but these tools only touch the surface of release management. At the other side of the spectrum you have the CI or push button triggered automated deployment tools that focus on the technical aspects but have little or no notion of orchestrated releases, let alone all of the requirements I mentioned above. Nonetheless, plenty of choice here, depending on your specific needs: XebiaLabs Deployit, ZeroTurnaround LiveRebel, Nolio, Ansible, glu, Octopus Deploy (.NET only), Rundeck, Thoughtworks Go, Inedo BuidMaster (the last two include their own CI server).
Instead, what we needed was a tool that would map closely to the existing manual processes and that could simply take over the administrative, boring, error prone and repetitive work of the release coordinator and provide an integrated view of the situation – linking together features, components, version numbers, releases, deployments, bug reports, environments, servers, etc – rather than a tool that would completely replace the human release coordination work in one step.
We were therefore very happy to find SmartRelease, a tool that nicely fit this gap. It was created by Streamstep, a small US-based start-up that was co-founded by Clyde Logue, a former release manager and a great person I had the opportunity to work with throughout the implementation of the tool. During this period the company was sold to BMC and renamed to Release Process Manager (RPM). But other than this tool nothing else came close to our requirements, something I found quite surprising considering the struggles that companies with many integrated applications typically have with their releases.
This was about three years ago. Looking at the market today, I see that some interesting tools have been introduced (or reached maturity) since then. Not only has SmartRelease/RPM been steadily extended with new features, also UrbanCode has come up with a new tool uRelease, Serena has pushed its Release Manager and UC4 has Release Orchestrator. Although none of them comes close yet to the ideal tool I have in mind (and I’m especially thinking about managing the dependencies between the features and components here) I don’t think it will take too long before one of them will close the gap.
On the other hand, where I do see a big hole in the market today is on the low-budget do-it-yourself side. Looking at how these four companies present their tools, I noticed that none of them make a trial version available that can be downloaded and installed without too much ceremony. This leads me to believe that they are mainly focusing on the “high-profile” companies where the general sales strategy is to send out a battery of sales people, account managers and technical consultant to try to seduce higher management for the need of their tools so they can send another battery of consultants to come in and do the implementation. If this manager is well aware of the situation on the work floor this may actually be a successful strategy.
For the devs or ops people from the work floor, the ability to play around with a tool, to set it up for a first project and to see if it really brings any value to the company somewhere in a dark corner outside of the spotlights is invaluable. And once they decide to make their move and ask for the budget they already have some real facts to prove their business case. I believe that there is a big opening here for tools that are similar to what Jenkins and TeamCity are to continuous integration: being easily accessible, creating a community around them and as such making the concept – in this case release coordination – mainstream.
Putting the tool to work
Back to our chosen tool Release Process Manager, let us have a look at how exactly it helped us industrialize our release management process.
First of all, it allows the release manager (typically a more senior role than the release coordinator) to configure her releases, and this includes specifying the applicable deployment dates for each environment. It also allows the developers to create a deployment request for a particular application and release which contains a set of deployment steps, one for each component that must be deployed. These steps can be configured to run sequentially or in parallel and manual (as in our case, thanks for not allowing access to the production servers Security team ;-)) or in automated fashion.
The deployment date the deployment requests for a particular release can then be grouped into a release plan – typically after all deployment requests are received – that allows to create release-specific handling like adding pre and post deployment steps.
And finally during the day of the deployment the release plan will be executed, going over each deployment step of each deployment request step either sequentially or in parallel. For each manual step, the ops team responsible for the deployment of the associated component will receive a notification and will be able to indicate success or failure. For each automated step an associated script is run that will take care of the deployment.
See here a mockup of how a deployment request looks like:
As part of the implementation project in our company the tool was integrated with two other already existing tools: the change management tool and the software repository:
It was notified by the change management tool whenever a change request (or feature) was updated. This allowed the tool to show the change requests that applied to that particular application and release directly on the deployment request which made it possible for the release coordinator to easily track the the statuses of the change requests and e.g. reject the deployment requests that contain not yet signed-off change requests.
It was notified by the software repository whenever a new version of a component was built which allowed the tool to restrict the choices of the components and their version number to only those that actually exist in the software repository.
See here an overview of the integration of the release management tool with the other tools:
More in general, by implementing a tool for release management rather than relying on manual efforts it became possible to increase the quality of the information – this was done by either enforcing that correct input data is introduced or by validating it a posteriori through reporting – and to provide a better visibility on the progress of the releases.
As an added bonus all actions are logged which allows for more informed post-mortem meetings etc. It also opens the way to create any kind of reporting on statuses, metrics, trends, KPI’s etc. that may become necessary in the future.
In a later stage a third integration was created, from the configuration management tool to the release management tool this time, which allowed the configuration management tool to retrieve the versions of the deployed components in each environment and include it in the view of the component (see my previous post for more info on this topic).
Another feature that I found very interesting but which was not yet supported by the release management tool was the possibility to automatically create of a first draft of the deployment request from the configuration management tool. I can easily image the developer opening up the configuration management view of his application, selecting the versions of the components he wants to include in his deployment request and pressing the button “Generate deployment request”. And if the latest versions that include change requests for the next release are selected by default most of the time the effort could be reduced to just pressing the button.
Conclusion
To conclude this blog post series, let us now take a step back and see which problems that were identified initially (more details in this earlier post) got solved by implementing a configuration management tool and a release management tool:
- The exponentially increasing need for release coordination, which was characterized by it’s manual nature, and the pressure it exerted on the deployment window => SOLVED
- The inconsistent, vague, incomplete and/or erroneous deployment instructions => SOLVED
- The problems surrounding configuration management: not being enough under control and allowing for a too permissive structure => SOLVED
- The manual nature of testing happening at the end of a long process meaning that it must absorb all upstream planning issues – this eventually causes the removal of any not signed-off change requests => NOT SOLVED YET
- The desire by the developers to sneak in late features into the release and thereby bypassing the validations => SOLVED
Looks good doesn’t it? Of course this doesn’t mean that what was implemented is perfect and doesn’t need further improvement. But for a first step it solved quite a number of urgent and important problems. It is time now for these tools to settle down and to put them under the scrutiny of continuous improvement before heading to the next level.
Wow, it has taken me a lot more time than I initially estimated to write down my experiences in a blog post series but I’m happy that I have finally done it. Ever since publishing my first post in the series I have received lots of feedback from people with all different kinds of backgrounds, heard a lot of similar stories also and got many new insights on the way.
I hope you have enjoyed reading the blog post series, hopefully as much as I have enjoyed writing it 🙂
Previous posts in the series:
- Second post: A closer look at introducing DevOps in a traditional enterprise – explaining the team structure and the problems that existed
- Third post: First steps in DevOps-ifying a traditional enterprise – taking control of configuration management
Tried to enjoy the post (relevant to my job; I use/have looked at some of these tools), but had to stop reading halfway through because I found the incessant male=default language so disheartening. Words that to some people are just a style choice or not even thought about at all, can be very you’re-not-welcome-here to those who are made invisible by those words.
Hi Sarah,
I’m really sorry my post felt woman-unfriendly because of my bad gender choices.
I have edited the post (used the more neutral plural form where possible and converted the release manager to a woman – which was the case during the final 6 months of the project :-)) in an attempt to make it more woman-friendly and I would be delighted if you would give my post another chance and let me know your thoughts – both on the gender topic as on the release-related stuff.
Kind regards,
Niek.
Hi Niek,
Thank you very much for your gracious response. It did bother me when all the examples you used were male, but your edits (both types you mention above) made your post much more engaging to me, and I’ve now read through it twice. 🙂
At a previous job I inherited the release manager hat but didn’t have access to sophisticated relevant tools, more the “managed by a combination of Excel, Word, e-mail, and a lot of plain simple human communication” approach (I had been doing CM with a combination of SVN, scripts, and some issue-tracking tools). At my current job, we’re using an older UrbanCode tool, AnthillPro, for the build and deploy automation scheduling/management, and a combination of other tools for change management, code reviews, etc. … integrated to a degree. Other folks deal more with release management / product management.
Your release orchestration perspective is a fresh one to me — I’ve read a lot of CM/DevOps posts about Chef and Puppet over the last year, and Git/Subversion/Ant/Maven/Nexus-level discussions, but not so much about enterprises that have multiple moving pieces to deploy that *have* to be coordinated together in the way you talk about here. The Release Process Manager tool that lets you map pretty closely to the old manual steps (not such a wrenching change) but still improve the information and visibility/transparency of the processes sounds good — as you say, there are more tools now that support the release coordination efforts, but there is an opening for tools that would let folks download and play around with them more without a lot of ceremony (sales/marketing). I like the idea of the release management tool being able to to automatically create of a first draft of the deployment request from the configuration management tool (less chance of errors that way, too).
I Sarah,
Thanks for sharing your thoughts, and I’m happy that you like my new writing style better now. If I can make a contribution to increasing the female / male ratio in IT I’m all in 😉
Niek.
Niek, A well-done series with an excellent amount of details and thoughts that went into the posts. Thanks for sharing your experience.
Hi David,
Glad you liked it!
Niek.
Excellent article Niek.
After a long time I came across an article which kind of fits with the reality and you have shared detailed information on how release management role is structured across many large scale enterprises.
Working as Release Manager over the last number of years, I have struggled find out answers tto many scenarios you have shared.
We have implemented a customized (in-house) release automation tool that is well orchestrated and fits well across numerous areas. Two things that were important among others were –
– Communication – Sharing information of what releases have been done, features associated, dates,etc
– Single Tool – Requirements for single automated tool that can be used for deploying across test and live infrastructures.
Some thing else that you have not mentioned in the post is the relation between environments and releases. Software environments are quite complicated and transition of releases across environments is often error prone. Management of environments is hell of lot of problem when there is a lot of integration.
Anyway..Thanks for sharing your thoughts. I enjoyed reading the article..
Cheers,
Hi Subhendu,
Thanks for sharing your experiences from the field. I would be very interested to find out more about how you have implemented this release management tool in your company, maybe an interesting subject for a blog post 😉
Regarding your comment about environment management: did you already have a look at http://www.plutora.com? I know they have a tool that focuses on that aspect.
Regards,
Niek.
Niek,
Thanks so much for this very detailed series of posts. I had not been able to find any details on a specific Release Management tool (as opposed to deployment automation tools or Change Management tracking tools). From the last response, is Plutora a better tool than RPM because it can both manage releases and statuses of the Test Environments?
Regards,
Nitin
Hi Nitin,
It has been a while since I looked at Plutora but if I remember well it focused mainly on the release and environment planning aspects and less on the release engineering side so difficult to compare the two. I think both are great tools and which tool is best largely depends on your specific situation.
Niek.
Niek,
Thanks so much for your series of blogs posts on release management in an enterprise. Do you think that Plutora is a better bet than RPM because if offers Environment Management as well?
Regards,
Nitin
This is a well rounded article Niek and your comments around “Release Deployment Plan” is spot on.
We spent 4 months doing PoC’s and tool comparisons for a Enterprise Release Management toolset which needed to include deployment coordination features and we found Plutora was ahead of the pack. We compared Service-now, Serena and IBM but Plutora was the only toolset which actually addressed our requirements in the release and deployment coordination space. We had been on XLS and sharepoint for so long until management decided to embark on this project and we’re now waiting for the exco to give us the green light to roll out Plutora.
Thanks for taking the time to write this excellent blog post series. A lot of your problems sounds very familiar and I LoLed several times.
As you say coordination grows super-linearly with number of components unless you decouple. We always stress that tools and process can only get you half way to mature well functioning continuous delivery – the high level architecture must change to get further.