On Reporting Problems

Chris Shiflett recently wrote on the inherent problems that go along with disclosing bugs in web applications (specifically security holes). I believe he took the responsible route of reporting the issue privately, waiting an appropriate time, and then releasing the details publicly. In his case, the “appropriate time” was a year, Amazon appears to have effectively reduced the potential damage of the issue, and everyone is sleeping soundly at night… but what if it didn't go so smoothly?

In the past five years or so, I've found and diagnosed a number of series issues with various applications of customers. A few have been security, a few have been data corruption/validation, but the vast majority have been performance related. In every scenario, I have documented the issue, notified the responsible decision makers, and proposed a solution. In some scenarios the solution – like a redesign of the database or removing a system from production – was cost prohibitive, but many times, the solution was relatively small and cost effective. Of course, in almost every single scenario, the solution was not taken and the problem persisted.

For example, there was a dating website which compared user profiles and preferences whenever a new user registered. This was implemented as an n^3 operation which would occur as soon as the new user verified their account. Unfortunately, the one time they'd be most likely to start using the system, they had to sit through this delay. When there were 1000 users, it was tolerable… barely. The solution was two phase but the first piece was to move it to a scheduled task at 2am. Reimplementation was next.

Fast forward a few months/years and what happens? Unfortunately, I am proven right when 6000 users cause a delay so large that the server begins to time out connections and users are left staring at an error screen.

So Keith, what's your point?

Thanks, I'm glad you asked. Before you do notify your customer of a problem like this, you must document everything about it. Figure out when the problem began. Figure out how much/which data is corrupted. Figure out a solution and a price estimate. Figure out what you will say/do when they say if (when) they either a) don't do anything or b) decline your solution. The most important aspect is that you must document every single step and piece of information… because it can come back to haunt you someday. Ignoring a fundamental problem also says a great deal about the decision maker, but that's anothe discussion.

If you are correct, the problem will arise for real. The system will break, data will be corrupted, or hackers will do nefarious things. If they're the type of organization which ignored the problem in the first place, they will be looking for a scapegoat and your name will be at the top of their list. Hopefully, your scrupulous record keeping will help protect you and make your life a bit less stressful.

But whatever you do… don't say “I told you so”.