By Alex Henthorn-Iwane, VP Product Marketing, ThousandEyes
You might have heard about a topic that’s gaining some attention in industry discussions, about an event that could potentially cause significant disruptions across the internet — the so-called ‘768k Day’. This day is the point in the sometime near future (some speculate in the coming month) when the size of the global BGP routing table is expected to exceed 768,000 entries. Why is this a big deal?
In 2014, on what we now know as ‘512k Day’, the IPv4 internet routing table exceeded 512,000 BGP routes when Verizon advertised thousands more routes to the internet. Many ISP and other organisations had provisioned the size of the memory for their router TCAMs for a limit of 512K route entries, and some older routers suffered memory overflows that caused their CPUs to crash. The crashes on old routers, in turn, created significant packet loss and traffic outages across the internet, even in some large provider networks. Engineers and network administrators scrambled to apply emergency firmware patches to set it to a new upper limit. In many cases, that upper limit was 768k entries.
512K Day was an internet milestone like the so-called ‘y2K’ crisis, and a wake-up call for a lot of ISP and internet organisations. The looming breach of the global BGP routing table threshold wasn’t a secret, yet enough providers were caught flat-footed due to outdated network equipment that it had a ripple effect in terms of the number of Internet outages it caused.
Fast forward five years later, and the upcoming 768k Day is an echo of 512k Day, just with a higher threshold. So, some are worried that the internet could have similar problems. This time around, one thing that we have going for us is that most large providers who felt the impact of 512k Day have learned their lessons, and have probably prepped and maintained their infrastructures reasonably well. As a result, we would hope not to see the same level of outages.
That said, while nobody’s exactly hyperventilating about 768k day, there are still a lot of smaller ISPs, data centres and other providers who are part of the fabric of the internet. When you look at internet paths, a good amount of service traffic transits through these ‘soft spots’ of internet infrastructure, if you will—where maintenance on legacy routers and network equipment can be neglected or missed more easily. Given the sheer size and unregulated nature of the internet, it’s fair to say that things will be missed.
Which means it’s still entirely possible that we’ll see some issues or outages due to 768k Day in the next month or so. Of course, there is a myriad of outage events that happen every day, especially on the fringes of the internet. The number of garden variety outages could get amplified because of 768k-related issues.
What would a 768k day issue look like in a particular provider network? Perhaps something like a recent outage we shared about where we could see total packet loss for monitoring tests that were crossing various router interfaces, as seen below in figure 2. In this case, we saw packet loss on several interfaces in the Cogent (AS 174) network in the San Francisco Bay area, that affected peer ISPs like Comcast and Qwest, as well as services like Amazon, Verisign, 8×8 and even enterprises like Athena Health.
So, it should go without saying that if you maintain routers that receive full internet routes, then before 768k Day arrives, make sure you’ve performed some preventative maintenance.