In this post I will talk about the first steps we took in “DevOps-ifying” the large traditional company I have worked for but before kicking off let me first shortly summarize the previous two posts (intro and problems) of the series:
I started off with a theoretical approach towards DevOps, explaining the two three (*) business benefits:
- The business wants the software to change as fast as possible
- The business wants the existing IT services to remain stable
- The business wants to get these two benefits as cheap and efficiently as possible
(*) I initially had only the first two but I added a third one to be aligned with chapter 7 of Gene Kim‘s “The Top 11 Things You Need To Know About DevOps” whitepaper (available on itrevolution.com after signing up)
The solution was to introduce DevOps on three levels: process (make it as rational and consistent as possible), tools (automate the process, especially where things must happen fast or in high volume) and culture (changing how people think, which is the most difficult and time consuming part).
I also added two remarks that I would like to keep in mind in the remainder of this post:
First of all, this territory of applying DevOps to a traditional enterprise is relatively new so we should be very cautious each step we take and see if what we try to do really makes sense. What has worked for others (the modern companies) may not necessarily work for us because we have an existing structure to take into account.
And secondly, as fiddling with the software delivery process in a large enterprise is a huge undertaking that has impacts on many people in many departments, we should iterate in small steps towards a better situation and start by tackling the biggest and simplest problems first, optimizing and automating where appropriate. This way we leave the time for the people to get used to the new process.
In my second post I talked about these “biggest and simplest” problems that existed in a traditional company:
- The exponentially increasing need for release coordination, which was characterized by it’s manual nature, and the pressure it exerted on the deployment window
- The inconsistent, vague, incomplete and/or erroneous deployment instructions
- The problems surrounding configuration management: being too vague and not reliable due to the manual nature of keeping track of them
- The manual nature of testing happening at the end of a long process meaning that it must absorb all upstream planning issues – this eventually causes the removal of any not signed-off change requests
- The desire by the developers to sneak in late features into the release and thereby bypassing the validations
These problems were caused (or at least permitted) by the way the IT department was organized, more precisely:
- The heterogeneity of the application landscape, the development teams and the ops teams
- The strong focus on the integration of the business applications and the high degree of coupling between them
- The manual nature of configuration management, release management, acceptance testing, server provisioning and environment creation
- The low frequency of the releases
Going back to the theoretical benefits of DevOps, in this case there was definitely a more urgent need for improving the quality and efficiency (the second and third benefit) than there was one for improving the speed of delivery (the first one).
Let us have a look now at the first steps we took in getting these problems solved: bringing configuration management under control.
It was quite obvious to me that we should start by bringing configuration management under control as it is the core of the whole ecosystem. A lot of building blocks were already present, it was only a matter of gluing them together and filling up the remaining gaps
This first step consisted of three sub-steps:
- Getting a mutual agreement on the structure of configuration management
- The implementation of a configuration management system (CMS) and a software repository – or definitive media library (DML) in ITIL terms
- The integration with the existing change management tool, build/continuous integration tools and deployment scripts
We decided to create three levels of configuration items (CI’s): on top the level of the business application. Business applications can be seen as the “units of service” that the IT department provides to the business. Below it the level of the software component. A software component refers to a deployable software package like third party application, a web application, a logical database or a set of related ETL flows and consists of all files that are needed to deploy the component. And on the bottom the level of the source code that consists of the files that are needed to build the component.
Here is a high-level overview of the three levels of CI’s:And here is an overview of the three levels of configuration management for a sample application CloudStore:
I believe that the decision on defining the applications and components is a very important one that has the potential to make the rest of the software delivery process a lot harder if not correctly done. This is a whole subject on its own but let me summarize this briefly by saying that an application should contain closely related source code (a good way of identifying relatedness is if two files must regularly be changed together for the implementation of a feature), should not be “too big” and not be “too small”, considering that what is small and big depends completely on the context. A component can be seen as a decomposition of the application by technology, although it can be further split up, e.g. to reflect the structure of the development team.
CI’s are not static. They must change over time as part of the implementation of new features and these changes must be tracked by versioning the CI’s. So you could see a version number as a way of uniquely identifying a particular set of features that are contained within the CI.
Here is an example of how the application and its components are versioned following the implementation of a change request:Once this structure was defined, the complete list of business applications was created and each application was statically (or at least “slowly changing”-ly (*)) linked to the components it was made up of as well as to the development team they were maintained by. Change requests that were created for a particular application could not be used anymore to change components that belonged to a different application. And deployment requests that were created for a particular application could not be used anymore to deploy components of a different application. In these cases a new change request or deployment request had to be created for this second application.
(*) It is fine to split up components that have become too big over time, or to link them to a different application or development team, but this should have an architectural or organizational foundation and not be driven by a desire to limit the number of deployment requests for the coming release.
So the resulting conceptual model looked as following:Following the arrows: an application contains one or more software components. There can be multiple change requests per application and release, but only one deployment request and it is used to deploy one, some or all of the software components.
Time for a little side note on dependencies and impact analysis now.
Most of the time the features that were requested by the business required modifications to multiple applications (this had to do with the high degree of integration and coupling that existed). For example if the application had to show an additional field not only the application itself but also the central data hub and the provider of the data possibly had to be modified to send the underlying data to the application. One of the big consequences of limiting the change requests to apply to only one application is that such feature requests had to be split over multiple change requests, one for each impacted application. These change requests then had to be associated to one another and such an association was called a functional dependency.
Technical dependencies also exist: a change request that requires a particular component to be modified always depends on all preceding change requests for which this same component had to be modified. So even though two change requests could be functionally completely independent, as soon as they touch the same component a technical dependency is created.
Note that these dependencies are most of the time uni-directional: one change request depends on the other but not necessarily the other way around: if this other change request is implemented in a backward compatible way then it doesn’t depend on the first change request.
I already mentioned in my previous post that it happened regularly that change requests had to be removed from a release because the testers were not able to sign them off. In these cases all functionally and technically dependent change requests also had to be removed from the release (except if the dev team still had time to come up with a new version of the component from which they had reverted the code changes for that not-signed-off change request of course) so it was important that this information was well tracked throughout the development process.
OK let’s leave the world of dependencies and impact analysis behind us for now and switch our focus to the implementation of the configuration management process.
In terms of storage and management, each of these three levels of CI’s has its own solution.
Source code is stored and managed by version control systems. This level is well documented and there is good support by tooling (git, svn, TFS, …) so I will not further discuss it here.
On the level of the components, the files that represent the component (executables, libraries, config files, scripts, setup packages, …) are typically created from their associated source code by build tools and physically stored in the DML. All relevant information about the component (the application it belongs to, the context in which it was developed and built, …) is stored in the CMS.
The business application has no physical presence, it only exists as a piece of information in the CMS and is linked to the components it is made up from.
Here is an overview of the implementation view:Once these rules were agreed, a lightweight tool was built to serve as both the CMS and DML. It was integrated with the build tools in such a way that it automatically received and stored the built files after each successful build of a particular component. By restricting the upload of files to exclusively the build tools (who in their turn assure that the component has successfully passed the unit-tests and the deployment test to a development server) at least a minimum level of quality was assured. Additionally, once a particular version number of a component was uploaded, it was frozen. Attempts to upload a newer “version” of the components with a version number that was already present would fail.
The build tools not only sent the physical files, but also all the information that was relevant to the downstream processes: the person who built it, when it was built, the commit messages (a.k.a. check-in comments) since the previous version, the file diffs since the previous version, …
Note that a version typically contained multiple commits (continuous “build” was only optional for the development teams).
With this information the CMS was able to calculate some interesting pieces of information that used to be hard to manually keep track of before, namely the team that is responsible for doing the deployment and the logical server group (e.g. “Java web DEV” or “.NET Citrix UAT”) to deploy to. Both were functions of the technology, the environment and sometimes other parameters that were part of the received information. As these calculation rules were quite volatile, they were implemented in some simple scripts that could be modified on-the-fly by the administrator of the CMS whenever the rules changed.
The CMS also parsed the change request identifiers from the commit messages and retrieved the relevant details about it from the change management tool (another integration we have implemented): the summary, type, status and the release for which it was planned.
The presence of the core data that came from the build tools combined with the calculated data and especially the data retrieved from the change management tool transformed the initially quite boring CMS to a small intelligence center. It became possible now to see the components (and their versions) that implemented a particular change request or even those that implemented any change requests for a particular release. (All of this assumes off course that the commit messages correctly contain the identifiers of the change request.)
In the following mock up you can see how the core data of a component is extended with information from change management and operations:It’s also interesting to look at configuration management from a higher level, from the level of the application for example:
By default all the components of the selected application were shown and for each component all versions since the version that was currently deployed in production were listed in descending order. In addition, the implemented change requests were also mentioned. (Note that the mock up also shows the which versions of the component were deployed in which environment, more on this in my next post about the implementation of a release management tool.)
In addition to this default view it was also possible to filter by release or just by an individual change request. In both cases only the relevant components and version numbers were shown.
The CMS also contained logic to detect a couple of basic inconsistencies:
- A particular version of a component didn’t implement any change requests (in that case it was still possible to manually associate a change request with the version)
- A component version implemented change requests that were planned for different releases (would be quite difficult to decide when to deploy it right?)
- A change request that was planned for a particular release was not implemented by any components
All of this information facilitated the work of the developer in determining the components he had to include in his deployment request for the next release. But it was also interesting during impact analysis (e.g. after a change request was removed from a release) to find out more information about an individual component or about the dependencies between components.
This ability to do impact analysis when a change request had to be removed from a release was a big deal for the company. One of the dreams of the people involved in this difficult task was the ability to get a complete list of all change requests that (functionally or technically) depend on this change request. Although it was not actually developed initially, it would not be very difficult to do now that all the necessary information was available in the CMS. The same could be said about including more intelligent consistency checks: it’s quite a small development effort for sometimes important insights that could save a lot of time.
Finally the deployment scripts of all technologies were adapted in such a way that they always retrieved the deployable files from the DML and as such finalizing the last step of a fully controlled software delivery pipeline from version control tool to production.
Here is an overview of how the CMS and DML were integrated with the existing tools:
Conclusion
All in all, a small development effort resulted in a big step forward in bringing control to the software delivery process. With this core building block – that configuration management is – now in place we can continue to build the rest of our castle on top of it.
You may have noticed that I have emphasized throughout this post the importance of keeping track of the dependencies between the change requests and the components so we are able to do proper impact analysis when the need arises. These dependencies are the cause of many problems within the software delivery process and therefore a lot of effort has to be put into finding ways to remove or at least limit their negative impact. The solution I mentioned here was all about visualizing the dependencies which is only the first step of the solution.
A much better strategy would be to avoid these dependencies in the first place. And the best way to do this is by simply decreasing the size of the releases which comes down to increasing the frequency of the releases. When releases happen infrequently, the change requests typically stack up to a large pile and in the end all change requests are dependent on one another due to the technical dependencies that exist through their components. You remove one and the whole card house collapses. But increasing the release frequency requires decent automation and this is exactly what we’re working on! But until the whole flow is automated and the release frequency can be increased we have to live with this problem.
If we take this strategy of increasing the release frequency to its extremes we will end up with continuous delivery, where each commit to each component triggers a new mini-release, one that contains a one-step deployment, that of the component that was committed. No more dependencies, no more impact analysis, no more problems! Nice and easy! Nice? Yes! Easy? Maybe not so. Because this approach doesn’t come for free.
Let me explain.
With the batch-style releases at least you could assume that whenever the new version of your component is deployed into an environment it will find the new versions of all components it depends on (remember that most of our features required changes to multiple components). It doesn’t have to take into account that old versions of these components may still hang around. With continuous delivery, this assumption is not guaranteed anymore in my opinion. It’s now up to the developer to make sure that his component supports both the old and the new functionality and that he includes a feature flag to activate the feature only when all components are released. In some organizations (and I’m thinking about the large traditional ones with lots of integrated applications) this may be a high price to pay.
In my next post I will explain how we implemented a solution for release management.
I recently had the opportunity to talk about my experience in introducing DevOps in a traditional enterprise at the DevOpsDays in London, so let me conclude by giving you a link to the video and the slideshow of my talk.
Previous posts in the series:
- Second post: A closer look at introducing DevOps in a traditional enterprise – explaining the team structure and the problems that existed
Next posts in the series:
- Fourth and final post: Implementing a release management solution in a traditional enterprise