Tracking Outages

The idea of tracking data outages spawned from an early discussion on the outages forum including feedback from an outages survey about having a status page for (un)planned outages as a central resource. The purpose of such effort is to have a wider focus that one could view as opposed to having to check dozens of provider status pages.

There were many ideas put forth but nothing really panned out and things kinda fell on the back burner.

Some of the questions raised during these discussions were:

  • This effort will require community support. We can’t imagine having insight into every planned and unplanned outages.
  • Initially we see this effort having an administrative/scaleability burden since we’ll need (trusted or vetted) folks who can keep it current/meaningful.
  • Since providers (carriers, colo, hosting, etc) guard their (un)planned outages close to their chest, not sure even as a customer, one can release the information to a public calendar when you as a customer are bound by their policy.

For further read click here for the results of this survey.

So how do we move forward? Basically we had to start somewhere.

Twitter is the first place people go when a major event takes place, especially major downtime events. External monitoring is limited by the transactions you define, and passive monitoring doesn’t tell you when people can’t access your site or API.

Crowdsourcing your monitoring may end up being the only way for major online services to know when something is wrong.

After months of tinkering we came up with tracker which is an initiative to crowdsource information about data outages critical to Internet Infrastructure. The project aims to crowdsource, primarily from twitter and other mediums like Web, Email(s), Smartphone Apps. The primary aim of tracker will be to collect data from people and make it accessible in various formats and provide it back for public use if interested. Our focus will be aimed at large-scale network-savvy content providers, access networks, global internet peering ecosystem, DNS root servers, major carrier failures, major data center, carrier hotel, COs, etc.

Tracker data is crowd sourced and is licensed under Public Domain Dedication and License, which means anyone is free:

  1. To Share: To copy, distribute and use the database.
  2. To Create: To produce works from the database.
  3. To Adapt: To modify, transform and build upon the database.

Without any restriction, as the data is generated by the crowd (people) it belongs to them.

What’s the point?

This has many potential uses in developing a better understanding of demand for network availability; users can hopefully use the data to ask their providers pointed questions.

Why?

Well that’s because (IMO), it makes sense that since the end user is the final determiner of the status of the Internet. It is the end user that will be affected, it seems reasonable to gather information from a user perspective. The key of all this is to be sure that whatever information is collected is relevant to the condition of the Critical Internet Resources.

The 64 bit question is, how can we engage and /or encourage providers to be more forthcoming and report outages w/o being concerned about the bottom line and instead put their customer’s interest first? I will even go on a limb and say this, its matter of time the heavy handedness of government aka “regulation” will force companies into a corner if things continue when it comes to close door outages reporting and this will further diminish the “free market”.

Given the reluctance of the providers to publicly report their service as “bad”, especially if not everyone has to report on the same basis and/or the measurement is not universally recognized. Even with the existence of a protective agreement, no one wants to report.

I really hope that network service providers, carriers and network operators around the globe will see the benefit of tracker as an unbiased central source and take a lead by posting events so everyone could benefit from it — including themselves. It seemed reasonable that providers should report outages as opposed to having external sources report them that “impact the end-user community”.

These aren’t issues we will solve immediately. They take time to build and they will ebb and flow. But as you diligently pursue staying on top of them, you will be locking in that legacy you desire for others to participate.

As I like to say, “we engineers shape networks, and afterwards outages shape us”.

Grateful thanks!

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>