It looks as the software development is one of the most risky activities on this planet. I can hardly think of any other profession in which the focus on the risk management is so embedded in a daily practice of making things. The reason is rather simple, unlike in any other (engineering) profession there is no real hard science that gives you a comfort that the project will ever succeed. In construction business, projects might go over budget or might be late, but you have hardly ever heard of a bridge collapsing because of the “minor glitch”.
Obviously, software developers are not any better or worse than other engineers, but our ability to mess the code, applications, deployments and projects in such variety of ways is something so unique to software projects that makes people to think and write about the risks in our craft all the time. TDD, code analysis tools, pair programming, code reviews, continuous integration, automation testing, stand-up meetings, iterative development … all these techniques are there to provide us with a fast feedback about the project status on time – before the crisis happens.
Today I would like to add one more technique in to this toolset, the one that I borrowed from the belief network propagation models, my other passion, next to software development. This is how I got to think about applying this model in the first place:
One thing that troubled me for quite some time was how we evaluate people in larger organizations. It turns out that the formula is rather simple, and it is based on the following question: “What would be the damage to the organization if that person leaves”. At first, it sounds as a reasonable question, till you ask yourself another question: “What would have been the damage to humanity if Einstein had died as a child?”. That is what I call possible negative impact vs. possible positive impact evaluation thinking. You may wonder what this has to do with risk analysis in the software development? Well, everything actually.
But before I continue, let me draw you a graph of a software team using belief networks: one application is developed by 6 developers, which work on 3 modules, and if one of the modules breaks – application goes down. In this model, person 6 is the only experienced developer (modeled by prior of doing correct things 99.9% of the time). Other developers have priors of doing things right “only” 99%. Without any observation, you can see that the probability of applications working while people coding is less than 70%. Since person2 works on two modules at any moment in time, he is the most “risky person” – (noted by 0 in the circle).
Let’s consider the same project from another angle. We are going to keep the same model, the same priors, with only one important difference. This time, I will consider that one module is healthy if at least one developer is doing the right thing. All the rest is the same. This time, I am not interested in knowing whether the build gets broken or not, but how many people I need to keep the project alive. Suddenly, the weakest link is person6, although his performance is 10 times better than of the others (have you heard this number before?).
So, here is the thing. The way you consider development strategies in your project is not only the question of being an optimist or a pessimist (although it helps to walk around with a smile on your face).
For instance, if you have relatively inexperienced team and your assumption is that everyone potentially can mess the code, you probably think it terms of possible negative impact exposure, and adopt the first model. In that case, only few people are allowed to “touch” the core of the product – by introducing architects and team leads that verify all checkins or something similar. Penalty you pay is a less flexible development model, modular ownership that bottlenecks development, less engaged team members and the fact that if the key people leave, the project is doomed. On the other side, projects are easier to “control”. This is a typical model that you can find in large multi-site (off-shore) projects. That is also the reason why in larger organizations, “possible negative impact” HR policy is a dominant one, since you do care a lot about the “checkers” – people that own the features or modules.
On the other hand, if developers work on different modules while implementing features, people are more engaged and you have fewer bottlenecks. The risk of project failing because of one guy is leaving is less likely. You can also witness quite some different development dynamics after a while, what Michael Feathers calls “emerging design”. This model of project management requires professionalism of all team members, and unfortunately, sometimes this is too much to ask. In companies and teams that strive to this model, “possible positive impact” is much more important than a negative (thinking). These are the organization that truly believe that people are “allowed to fail” – as long as they tried their best. Some people call this agile development model.
In reality, you have mixture of both. It is hard to imagine that in the bigger projects people do not develop either preference or expertise in some particular part of the product. Therefore, it is always a fun to create such diagrams on your project and try to figure out where the most troubles you might expect – when it comes to development/application or HR dependencies.
Finally, I would imagine that it would not be that hard to build this graph from the SVN or GIT, where you would analyze checkins per user per component and the number of broken builds to build graphs with some priors. I hope to do it one day and publish my findings in some future blogs.