Last week I finished reading the book The Phoenix Project by Gene Kim and I really enjoyed it. It describes the adventures of a guy named Bill who almost single-handedly (although he had the help of demigod Erik) saves the company from bankruptcy by “industrializing” the IT department and in particular the delivery of new software from development to operations (in short DevOps).
Having worked in a similar context for the past years I could easily identify with Bill and therefore the book has given me a great insight in the various solution patterns that exist around this topic.
I have to agree however with the IT Skeptic on some of the critics he wrote in his review of the book:
Fictionalising allows you to paint an idealised picture, and yet make it seem real, plausible. Skeptics know that humans give disproportionate weight to anecdote as evidence.
It is pure Hollywood, all American feel-good happy endings.
The Phoenix Project wants you to believe that within weeks the staff are complying with the new change process, that business managers are taking no for an answer, that there are simple fixes to complex problems.
So basically he doesn’t believe that the book does a good job in representing the reality.
That’s when I reasoned it may be interesting for others if I published my own experiences in this area in an attempt to fill the hole between Gene’s fairy tale and the IT Skeptic’s reality.
Let me start by giving you my view on this whole subject of DevOps (highly influenced by the excellent presentation of John Allspaw and Paul Hammond of Flickr):
- The business – whether that be the sponsor or the end-user – wants the software to change, to adapt it to the changing world it represents, and it want this to happen fast (most of the time as fast as possible)
- At the same time, the business wants the existing IT services to remain stable or at least not disrupted from the introduction of changes
The problem with the traditional software delivery process (or the lack thereof) is that it is not well adapted to support these two requirements simultaneously. So companies have to choose between either delivering changes fast and ending up with a messy production environment or keeping a stable but outdated environment.
This doesn’t work very well. Most of the time they will still want both and therefore put pressure on the developers to deliver fast and on the ops guys to keep their infrastructure stable. It is no wonder that dev and ops will start fighting with each other to protect their objectives and as a result they will gradually start drifting away from each other, leaving the Head of IT somewhere stuck inside the gap that arises between the two departments.
This picture pretty much summarizes the position of the Head of IT:
You can already guess what will happen with the poor man when the business sends both horses into different directions.
The solution is to somehow redefine the software delivery process in order to enable it to support these two requirements – fast and stable – simultaneously.
But how exactly should we redefine it? Let us have a look at this question from the point of view of the three layers that make up the software delivery process: the process itself, the tooling to support it and the culture (i.e. the people who use it):
First of all the process should be logic and efficient. It’s funny how sometimes big improvements can be made by just drawing the high-level process on a piece of paper and removing the obvious inconsistencies.
The process should initially take into account the complexities of the particular context (like the company’s application landscape, its technologies, team structure, etc) but in a later stage it should be possible to adapt the context in order to improve the process wherever the effort is worth it (e.g. switch from a difficult to automate development technology to an easy to automate one).
Secondly, especially if the emphasis is on delivering the changes fast, the process should be automated where possible. This has the added benefit that the produced data has a higher level of confidence (and therefore will more easily be used by whoever has an interest) and that the executed workflows are consistent with one another and not dependent on the mood or fallibility of a human being.
Automation should also be non-intrusive with regards to human intervention. With this I mean that whenever a situation occurs that is not supported by the automation (e.g. because it is too new, too complex to automate and/or happens only rarely) it should be possible for humans to take over from there and when the job is done give control back to the automation. This ability doesn’t come for free but must explicitly be designed for.
And then there is the cultural part: everyone involved in the process should get at least a high-level understanding of the end-to-end flow and in particular a sufficient knowledge of their own box and all interfacing boxes.
It is also well-known that people resist change. It should therefore not come as a surprise that they will resist changes to the software delivery process, a process that has a huge impact on they way they work day-to-day. There can also be a trade-off: some changes are good for the whole (the IT department or the company) but bad for a unit (a developer, a team, …). An example would be a developer who has to switch to Oracle because that’s what the other teams have experience with, while the developer wants to stick with MySQL because that’s what he knows best. Such changes are particularly difficult to introduce so they should be treated with care. But most of the people are reasonable by nature and will adapt as soon as they understand the bigger picture.
OK so now we know how we should upgrade the process.
But are we sure that it will bring us the expected results? Is there any evidence? At least the major tech companies like Flickr (see the link above), Facebook (here), Etsy (here), Amazon (here) etc show us that it works in their particular case. But what about the traditional enterprises with their legacy technologies that lack support for automation and testability, their human workforce doesn’t necessarily know or embrace these modern concepts, their heavy usage of frameworks that are known to have bureaucratic top-down implementations (CMMi,ITIL, TOGAF, …)? Can we apply the same patterns in such an extremely different context and hope for the same result? Or are these traditional companies doomed and awaiting to be replaced by modern companies, those that were built with agility and automation in mind? Or in other words is there a trail from Mount Traditional to Mount Modern?
I don’t know the answer but I don’t see why it would not work and there is not really the option of not at least trying it, is there? See also Jez Humble’s talk and Charles Betz’s post on this subject.
OK let us leave the theory behind us and switch our focus to my own experience in introducing DevOps in a traditional enterprise in the financial sector.
First a bit about myself: in this traditional enterprise I was responsible for the development community. My role was to take care of all cross-functional stuff like the administration of the version control tools, build and deployment automation, defining the configuration management process, etc. I also was the representative of the developers towards the infrastructure, operations and release management teams.
Configuration and release management had always remained mostly a manual and labor-intensive activity that was known to be one of the most challenging steps in the whole process of building and delivering their software. Due to the steady growth of the development community and the increasing complexity of technologies the problems surrounding these two activities grew exponentially over time causing a major bottleneck in the delivery of software. At one point it became clear to me that we had to solve this problem by doing what we are good at: automation. Not for the business this time but for ourselves: the IT department. So I set out on a great adventure to get this issue fixed …
It was clear very soon that without bringing configuration management under control, it would not be possible to bring control in release management because it relies so heavily on it. And thanks to this we would also be able to finally solve a number of other long-standing issues that all required proper configuration management but didn’t seem important enough on their own to initiate it.
So finally the project consisted of the following actions:
- getting a mutual agreement between all teams on the global software delivery process to be used within the company (including the definition of the configuration management structure), using ITIL as a guide
- the implementation of a release management tool
- the implementation of a configuration management system (CMS) and a software repository (which is called a definitive media library or DML in ITIL terms)
- the integration with the existing build and deployment automations and the change management tool
Note that I use the term configuration management in the broader sense as used by Wikipedia (here) and ITIL and not in the sense of server provisioning or version control of source code: the definition of the configuration items, their versions and their dependencies. This information is typically contained in a CMS.
With release management I mean the review, planning and coordination of the deployment requests as they are delivered from the developers to the operations teams.
This is my mental model of these concepts:
Fiddling with the software delivery process in big and heterogeneous environments as can be found in the traditional enterprises is a huge undertaking that has impacts in many areas within many departments and on many levels. It was therefore out of the question to transition from a mostly manual process to a fully automated one on one step. Instead we needed a tool that could be used to support the release management team with their work but that would also allow to gradually take more responsibilities as time goes by and help the transition to a more end-to-end automated approach.
During my research I was quite surprised that so few tools were available that aim to support release management teams and I believe now that the market was just not yet ready for it. Also in the traditional enterprises development has generally become more complex, more diverse, more service oriented etc in the last years and the supporting tools now need some time to adapt to this new situation.
(For those interested: after a proof of concept with the few vendors that claimed to have a solution we chose Streamstep’s SmartRelease, which was acquired by BMC and renamed to RPM during the project.)
Also the culture had to be adapted to the new process and tooling. This part involves changing the people’s habits and has proven to be the most challenging and time-consuming part of the job. But these temporary frictions were dwarfed by the tremendous gains we received by the automation: less manual work, simplified processes, better visibility, trustworthy metrics, etc.
By implementing a clear process that was unavoidably more restrictive – and above all enforceable by the tooling – I noticed that in general developers resisted it (because they lose their freedom to do things at their own will and believe me there were some exotic habits in the wild) and ops guys embraced it (because their job becomes easier).
As I already mentioned I don’t know if it is possible to DevOps-ify traditional enterprises the way web 2.0 companies do it (namely by almost completely automating the software delivery process and installing a culture of continuous delivery from the early beginnings). But it may make sense to not look too far ahead in the future but instead start by tackling the biggest and simplest problems first, optimizing and automating where appropriate, and as such iterating towards a better situation. And hope that in the meantime a web 2.0 company has not run away with their business
Can traditional enterprises adapt to the modern world or will they become the equivalent of the dinosaurs and get extinct?
In my next post I will zoom one level deeper into the reasoning behind these decisions. I will first give an overview of the situation that existed before the project was initiated, looking at the specifics of each team. Then I will focus on the problems that derived from it and the solutions that were followed.
Next posts in the series:
- Second post: A closer look at introducing DevOps in a traditional enterprise – explaining the team structure and the problems that existed
- Third post: First steps in DevOps-ifying a traditional enterprise – taking control of configuration management
- Fourth and final post: Implementing a release management solution in a traditional enterprise
Nice post. I especially agree with “there is not really the option of not trying it.”
However, this section raises questions:
“… whenever a situation occurs that is not supported by the automation (e.g. because it is too new, too complex to automate and/or happens only rarely) it should be possible for humans to take over from there and when the job is done give control back to the automation.”
It’s of course what we’re going to do. But cutting edge discussions in human factors are focusing on this problematic “human in the loop” dynamic. When automation (like an autopilot) is humming along, the human gets more and more disengaged from situational awareness. Then, when control is handed back because there is a crisis, the human completely fails to perform.
I don’t have any links for you on this but I’m sure a little searching will lead you to these discussions. We’ll be dealing with these issues in IT ops soon enough, I think.
Thanks for the feedback. I see what you mean: automation may eventually become a victim of its own success in the sense that after a while nobody will know anymore how the process works which creates a dangerous dependency.
The automation needs in the project I described were limited to the simple and repetitive ones though. We were still miles away from a tool that would also be capable of automating more complex things like risk assessment, impact analysis, political decision making etc and thereby removing the need for humans to know the whole process.
But I think you are right in that at some point this “human in the loop” issue will pop up in our IT department.
Great review and blog. I’m 2/3 through the book and enjoying the story and the overlaps with real life. Fairy tale maybe but a lot of situations I can relate to with many different companies.
Indeed, I had the same feeling while reading the book. Happy continuation with the final 1/3!
Reblogged this on Snowballs in winter and commented:
DevOps and The Phoenix Project – a good review and write-up.
I completely agree with you when you say that the developers are the ones who are the most uncomfortable with change. They like to live in their own silos (check in and forget). I have been through that phase. Production was unknown to me and human beings generally fear the unknown. What helped me wake up was being put in teams like production support and operations for sometime. It made me understand the production environment better and empathise with the sysadmins. I have seen the pattern of dev rotation get a few clients closer to devops principles and continuous delivery in general.
I agree. I have noticed quite a lack of interest – or even an allergy – from developers towards the operations side. I believe these problems are rooted in the historically manual/administrative/messy way of communication between both sides. So hopefully the streamlining of the processes and the introduction of some tooling will smoothen their relationship.
Nice article. Our difficulty is trying to manage the interface between our modular development system and how we sell the software as separate products. There is a lot of functional overlap. Things are also complicated by the fact that the product is bundled with hardware and now we looking to deliver in virtualized formats as well. This creates a lot of flexibility at the expense of efficiency and automation. Continuous delivery (CD) could be the answer.
Continuous delivery is something any software company can try. The goal is to get feedback from the customer as quickly as possible. Efficency is the key, and some companies will be able to completely automate their delivery process to achieve the fastest feedback possible. This is true of companies such as Facebook, Salesforce and probably any SaaS company. For the rest of us, it is still possible to achieve the same goal using automation regardless of whether our release cycles are hours or months. The point is that it is a continuous improvement process.
CD also makes use of Value Stream Mapping to identify inefficiencies and this correlates well with “drawing the high-level process on a piece of paper and removing the obvious inconsistencies”.
Regarding tool support, I think initially a lot can be achieved with just pen and paper. Value Stream Mapping and Kanban are two examples. Later on software tools like Puppet might be appropriate.
A good introduction to CD:
Thanks for the fantastic write-up. I’m enjoying your blog post series so far.
I thought I’d mention that David Farley and I wrote the “Continuous Delivery” book to address more or less the situation you describe – it was our frustration at encountering these kind of problems that led us to write it. You may find it useful or (at least) cathartic to read. There’s a chapter available for free here: http://www.informit.com/articles/article.aspx?p=1621865
All the best and I look forward to reading the rest of your blog series.
Great that you like the series!
Your book is fantastic, I bought it when around the time that the project started and read it cover to cover in no time. My ideas were greatly influenced by what you wrote. It feels a bit like I’m standing on your shoulders since then and the views have been amazing up there 😉
Hope to meet you in person one day,
Pingback: ◎ DevOps, Visual Representation, Monitoring for Humans — EtherealMind
A great series of articles on the additional challenges Continuous Delivery adoption faces in brownfield organisations.
It’s not uncommon to find configuration management a sticking point prior to CD – like CD it touches all corners of an organisation, like CD everyone has a different view on it, like CD everyone has a different opinion. It’s great to hear such honest views on how to affect change across an organisation.
Thanks Steve, I highly appreciate your feedback! And looking forward to further discuss this (and more) with you in person some day.