Managing Technical Debt

Technical debt is simply:

The amount of time, money, or effort it takes to work around, manage, and fix bad decision/implementation decisions.

Or as my friend and former colleague Duane Gran puts it (summarized, not verbatim):

Technical debt is like leaving dirty dishes in the sink.  You're going to have to wash them anyway, but in the meantime they just stack up.

If you look through the issue tracker of any Open Source project or even your own issue tracker, you'll notice that every project has a bit.  It could be little things that a small focused effort could effectively resolve or it could be "OMG we need re-write the app from the ground up" sorts of issues.  Either way, just like real financial debt, you have to determine when it is necessary to take on or pay off some of the debt, die under the sheer weight of it all, or figure out how to manage it effectively.

Pay It or Take It:

I'd love to be able to say "you should always fix issues as soon as you find them" or even "All technical debt is bad", but the world doesn't work that way.

There are numerous times where issues will have to be overlooked either due to lack of time, effort, and understanding… or just because something more important has yet to be done.  Whenever you look at your software with a critical eye, you will find things large and small that could be faster, easier, better, make better use of memory, be more secure, etc, etc, etc.  The list goes on…you'll never have enough time to fix everything, you have to prioritize.

While you might think it's a great idea to say "All technical debt is bad!", it simply doesn't work that way.  One of my geek mentors Martin Fowler – author of two of my favorite technology books "Refactoring" and "Patterns of Enterprise Application Architecture" – points out that there are times when business needs/timelines might outweigh "doing it the right way".  Regardless, you need to note/log this debt to manage it.

Whenever I look at my full todo list for web2project, I want to hide under the bed.  There are simple things that simply require a lot of time – formatting the code to following the PEAR coding standard – but then there are wildly complex things that to include them here would force me under the bed again.  But every single one of the items is logged.

Die Under the Weight of It All:

In the Open Source world, one of two things tends to happen at this point… the project grinds to a halt or someone decides to re-write it from the ground up.

When a project dies, it's quiet and subtle.  The team tends to only work on the "fun" things and new functionality that doesn't touch the old messes.  The releases come farther and farther apart with smaller and smaller improvements.  The commits slow to a trickle.  Eventually the team itself stops interacting with the community as they've moved on to other priorities.  I've seen it more times than I can count and it's hard to blame the team in the slightest… after all, this is starting to look like real work and they're not getting paid for this effort.

On the flip side, quite often the team will decide to re-write from the ground up.  They create a grand scheme for replacing the whole system with something clean, cool, and amazing.  While the plan looks good on paper, it normally doesn't happen that way.  The timeline is much longer than initially imagined and the work is more than they thought, and often the team drifts away and the project quietly dies (as above).  Regardless, if a strong vision can keep the team together, a re-write could be successful.

In the commercial world, the quiet death seems to be rare.  After all, if a product is generating revenue, it's unlikely to be shelved without a replacement.  Therefore, I believe complete re-writes are much more common… Windows Vista, Netscape/Mozilla, etc.

Manage It Effectively:

We've all had that one library, class, or method that irritates everyone on the project.  It's probably a performance bottleneck, it's prone to breaking, and it haunts you in your sleep.  I'd wager that you have something even worse in your code… something that has worked well enough so far that no one has noticed how nasty and hacked together that is.  And this is where technical debt is the nastiest:  When you don't know you have it.

Sometime soon you'll have a major release, there will be customers screaming at you, your boss will be on his twelfth cup of coffee (and 28th time past your cube), and your email will be blowing up is the one time this monstrosity is going to break.  It's not going to break and fail safe like a good little piece of software, it will fail spectacularly.  It's going to fail and spew garbage across your database tables, take down your server, and even kick your dog… twice.

The problem is that if you don't know where your potential problems are – even if you can't fix them immediately – you don't know how to mitigate risks appropriately.  When you find problems in the system, you have to do one of two things:  Fix them or log them.

And here's the key: Every time you find debt, log it and prioritize it.

Your priorities can be as simple as: "Now!", "Soon", "Later", and "Wouldn't it be nice?"

With every debt "on the books" and prioritized when there are new feature requests or you have a bit of downtime, you have a better idea of what needs to be done and what could be done.  In addition,  by keeping this information along with the existing issue list, you can relate it to other issues and quite often you'll see patterns emerge where the areas with the most technical are also the areas with the most bugs. This will allow you to track down and eliminate the debts where your payment is the highest and most painful.