By Alex Henthorn-Iwane, VP Product Marketing, ThousandEyes
You might have
heard about a topic that’s gaining some attention in industry discussions,
about an event that could potentially cause significant disruptions across the internet
— the so-called ‘768k Day’. This day is the point in the sometime near future
(some speculate in the coming month) when the size of the global BGP routing
table is expected to exceed 768,000 entries. Why is this a big deal?
In 2014, on what
we now know as ‘512k Day’, the IPv4 internet routing table exceeded 512,000 BGP
routes when Verizon advertised thousands more routes to the internet. Many ISP
and other organisations had provisioned the size of the memory for their router
TCAMs for a limit of 512K route entries, and some older routers suffered memory
overflows that caused their CPUs to crash. The crashes on old routers, in turn,
created significant packet loss and traffic outages across the internet, even
in some large provider networks. Engineers and network administrators scrambled
to apply emergency firmware patches to set it to a new upper limit. In many
cases, that upper limit was 768k entries.
512K Day was an internet
milestone like the so-called ‘y2K’ crisis, and a wake-up call for a lot of ISP
and internet organisations. The looming breach of the global BGP routing table
threshold wasn’t a secret, yet enough providers were caught flat-footed due to
outdated network equipment that it had a ripple effect in terms of the number
of Internet outages it caused.
five years later, and the upcoming 768k Day is an echo of 512k Day, just with a
higher threshold. So, some are worried that the internet could have similar
problems. This time around, one thing that we have going for us is that most
large providers who felt the impact of 512k Day have learned their lessons, and
have probably prepped and maintained their infrastructures reasonably well. As
a result, we would hope not to see the same level of outages.
That said, while
nobody’s exactly hyperventilating about 768k day, there are still a lot of smaller
ISPs, data centres and other providers who are part of the fabric of the internet.
When you look at internet paths, a good amount of service traffic transits
through these ‘soft spots’ of internet infrastructure, if you will—where
maintenance on legacy routers and network equipment can be neglected or missed
more easily. Given the sheer size and unregulated nature of the internet, it’s
fair to say that things will be missed.
Which means it’s
still entirely possible that we’ll see some issues or outages due to 768k Day
in the next month or so. Of course, there is a myriad of outage events that
happen every day, especially on the fringes of the internet. The number of
garden variety outages could get amplified because of 768k-related issues.
What would a 768k day issue look like in a particular provider network? Perhaps something like a recent outage we shared about where we could see total packet loss for monitoring tests that were crossing various router interfaces, as seen below in figure 2. In this case, we saw packet loss on several interfaces in the Cogent (AS 174) network in the San Francisco Bay area, that affected peer ISPs like Comcast and Qwest, as well as services like Amazon, Verisign, 8×8 and even enterprises like Athena Health.
So, it should go
without saying that if you maintain routers that receive full internet routes,
then before 768k Day arrives, make sure you’ve performed some preventative