‘External threats a rising cause of outages for data centres’

Author: Joe Peck

External infrastructure failures and outages linked to fibre and connectivity issues are becoming more prominent for data centres, according to new research from the Uptime Institute, a US-based independent data centre standards and certification body.

Despite that, on-site outages for data centres have declined for the fifth consecutive year, with approximately one in 10 noting that their last outages had a serious or severe impact.

The cost of major outages continued to rise, with 57% stating that their most recent major outage cost over $100,000 (£74,800) and one in five reporting a cost of over $1 million (£748,000).

Richard Petrie, CTO of the London Internet Exchange (LINX), comments, “Networking and connectivity continue to sit at the top of the most common causes of IT outages, reinforcing the importance of resilience in this area.

“As organisations face growing pressure from network congestion, external threats, and increasing reliance on third-party providers, resilience across both network and data centre infrastructure is becoming critical.

“While it’s encouraging to see on-site outages declining as infrastructure providers continue to prioritise resilience, the risks posed by external failures mean organisations still need robust redundancy policies in place for when outages do occur.

“The backbone of a strong redundancy strategy is a secondary fabric that allows data to be rerouted during periods of disruption or risk, helping organisations remain operational even when the primary network is compromised.

“By providing multiple options to route traffic, organisations can strengthen resilience and help networks stay online.”

Power failures a contributing a factor

The leading cause of impactful outages was power, with failures involving UPS systems, transfer switches, and generators remaining prominent.

Worsening grid constraints and high-density workloads were also found to contribute to outages as a newer challenge.

To adapt, the research outlined that operators are adapting investment strategies towards automation and control systems in order to manage complexity, despite acknowledging that more automation can cause different classes of problems.

In line with the causes of outages, resilience assessments were found to focus more on internal systems than on external and systemic risks.

Andy Lawrence, founding member and Executive Director of Uptime Intelligence, says, “Outages overall have slowed down and, overall, digital infrastructure is remarkably resilient. But further resiliency gains are becoming harder to achieve.

“We believe that over time, failures will increasingly not be the result of a single point of failure, but instead be linked to complex interactions between systems, including software, networks, and external dependencies.

“While site-based electrical and mechanical infrastructure remain a critical building block that needs to be resilient, digital infrastructure is becoming more distributed with outages originating outside the data centre, including those tied to power availability, network connectivity, or the reliance on external cloud services playing a larger role.”



Related Posts

Translate »