Risk Management
Good XP teams do achieve a stable velocity. Unfortunately, velocity only reflects the issues the team normally faces. Life always has some additional curve balls to throw. Team members get sick and take vacations; hard drive crash, and although the backups worked, the restore doesn’t; stakeholders suddenly realize that the software you’ve been showing them for the last two months needs some major tweaks before it’s ready to use.
Despite the uncertainties, your stakeholders need schedule commitments that they can rely upon. Risk management allows you to make and meet these commitments.
Risk Management Plan
Every project faces a set of common risks: turnover, new requirements, work disruption, and so forth. These risks act as a multiplier on your estimates, doubling or tripling the amount of time it takes to finish your work.
How much of a multiplier do these risks entail? It depends on your organization. Because most organizations don’t have this kind of information available we’ll use some based on the DeMarco & Lister’s RISKYOLOGY simulator.
| Percent chance | Rigorous | Risky | Description |
| 10% | x1 | x1 | Almost impossible ("ignore") |
| 50% | x1.4 | x2 | 50-50 chance ("stretch goal") |
| 90% | x1.8 | x4 | Virtually certain ("commit") |
These multipliers show your chances of meeting various schedules. For your example, in a “Risky"” approach, you have a 10 percent change of finishing according to your estimate schedule. Doubling your estimates gives you a 50 percent to chance of on-time completion, and to be virtually certain of meeting your schedule, you have to quadruple your estimates.
If you use the XP practices – in particular, if you’re strict about being “done done (agile principle number 7)” every iteration, your velocity is stable, and you fix all your bugs each iteration – then your risk is lowered. Use the risk multiplier in the “Rigorous” column. On the other hand, if you’re not strict about being “done done” every iteration, if your velocity is unstable, or if you postpone bugs and other work for future iterations, the use the risk multiplier in the “Risky column”.
Although these numbers come from studies of hundreds of industry projects, those projects didn’t use XP. As a result, it’s guessed somewhat at how accurately they apply to XP. However, unless your company has a database of prior projects to turn on, they are your best starting point.
Project-Specific Risks
Using the XP practices and applying risk multipliers will help contain the risks that are common to all projects. The generic risk multipliers include the normal risks of a flawed release plan, ordinary requirements growth, and employee turnover. In addition to these risks, you probably face that are specific to your project. To manage these, create a risk census – that is, a list of the risks your project faces that focuses on your project’s unique risks.
[DeMarco & Lister 2003] suggest starting work on your census by brainstorming catastrophes. Gather the whole team and hand out index cards. Remind team members that during this exercise, negative thinking is not only OK, it’s necessary. Ask them to consider ways in which the project could fail. Write several questions on the board:
- What about the projects keep you up at night?
- Imagine it’s year after the project’s disastrous failure and you’re being interviewed about what went wrong. What happened?
- Imagine your best dreams for the project, then write down the opposite.
- How could the project fail without anyone being at fault?
- How could the project fail if it were the stakeholders’ fault? The customers’ faults? Testers? Programmers? Management? Your fault? Etc.
Write answers on the cards, then read the aloud to inspire further thoughts.
Once you have your list of catastrophes, brainstorm scenarios that could lead to those catastrophes. From those scenarios, imagine possible root causes. These root causes are your risks: the causes of scenarios that will lead to catastrophic results.
For example, if you’re creating an online application, one catastrophe might be “extended downtime". A scenario leading to that catastrophe would be “excessively high demand”, and root causes include “denial of service attack” and “more popular than expected”.
After you’ve finished brainstorming risks, let the rest of the team return to their iteration while you consider the risks within a smaller group. For each risk determinate:
- Estimated probability (High, Medium, Low)
- Specific impact to project if it occurs – pounds lost, days delayed, and project cancellation are more common possibilities.
- You may be able to discard some risks as unimportant immediately. Ignore unlikely risks with low impact and all risks with negligible impact. Your generic risk multipliers accounts for those already.
- For the remainder, decide whether you will avoid the risk by not taking the risky action; contain it by reserving extra time or money, as with the risk multiplier; or mitigate it by taking steps to reduce its impact. You can combine these actions.
- For the risks you decide to handle, determinate transition indicators, mitigation and contingency activities, and your risk exposure:
-
Transition indicators tell you when the risk will come true. It’s human nature to downplay upcoming risks, so chose indicators that are objective rather than subjective. For example, if your risk is “unexpected popularity causes extended downtime”, then your transition indicator might be “server utilization trend shows upcoming utilization over 80 percent”.
-
Mitigation activities reduce the impact of the risk. Mitigation happens in advance, regardless of whether the risk comes to pass. Create stories for them and add them to your release plan. To continue the example, possible stories include “support horizontal scalability” and “prepare load balancer”.
-
Contingency activities also reduce the impact of the risk, but they are only necessary if the risk occurs. They often depend on mitigation activities that you perform in advance.
-
Risk exposure reflects how much time or money you should set aside to contain the risk. To calculate this, first estimate the numerical probability of the risk and then multiply that by the impact. When considering your impact, remember that you will have already paid for mitigation activities, but contingency activities are part of the impact.
Some risks have a 100 percent chance of occurring. These are no longer risk – they are reality. Update your release plan to deal with them.
For the remaining risks, update your release plan to address them. You will need stories for mitigation activities, and you may need stories to help you monitor transition indicators. For example, if you risk is “unexpected popularity overloads server capacity”, you might schedule the story “prepare additional servers in case of high demand” to mitigate the risk, and “server load trend report” to help you monitor the risk.
You also need to set aside time, and possibly money, for contingency activities. Don’t schedule any contingency stories yet – you don’t know if you’ll need them. Instead, add up your risk exposure and apply dollar exposure to the budget and day exposure to the schedule. Some risk will occur and others won’t, but on average, the impact will be equal to your risk exposure.