Policy Overview

In a nutshell, this policy is about moving the tolerance-level for OOPSes to zero. This mean that any user-visible error happening in production is a stop-the-line event and should be fixed ASAP. This includes javascript errors even though we do not currently record OOPS for them: 741991.

timeout tagged bugs.

Why this policy?

For three interlocking reasons:

This policy is basically about making sure that the Exceptions and Timeouts section of the report are empty.

What should be done about OOPSes

Once we achieve Zero-OOPS status:

But All OOPSes are not the same

All OOPSes in the "Exceptions" and "Time outs" sections should be eliminated. If an OOPS isn't important - because it's only triggered by robots, or for whatever reason, then the system should be changed to not record an OOPS.

One way to prevent an OOPS being visible is to change the exception type so that it doesn't trigger the OOPS code.

The end goals are:

The expected result of achieving these goals is that the system will generally be in good shape and if an OOPS is recorded its something important we should work on immediately - no sifting through many false positives.

When

We are starting this policy now.

Coming Soon

Burn down chart of the bugs with the "oops" tags.

PolicyAndProcess/ZeroOOPSPolicy (last edited 2011-09-14 20:39:02 by lifeless)