Patterns, Not Categories: Learning Across Incidents

Date:

Outage pattern analysis is hard! There have been many attempts to learn across multiple incidents. Folks look for categories, tags, causes, etc. to identify what’s brittle or risky in their system, sometimes even using statistical models to help make sense of the data. However, their results often prove unsatisfying, non-actionable, or don’t tell you anything you didn’t already know from other sources.

An alternate approach is to find patterns via Christopher Alexander’s “Pattern-Centered Inquiry”. Complex systems fail according to certain patterns or fundamental laws. We can identify and learn from these patterns and then see how their individual, diverse manifestations in our systems develop and manifest. An understanding of patterns and how to spot them then underpins better informed reliability decision making.