A new report demonstrating how the July 2022 heatwave led to unplanned NHS data centre outages underlines the need to review existing continuity plans with weather in mind, according to global moisture and temperature specialists at Aggreko.
The NHS report found that the hotter weather compromised cooling systems at two facilities that provided backup to one another, disrupting digital services at London hospitals to the cost of £1.4 million. It also found that the incident could have been prevented if cooling systems had been adequately prepared for spiking temperatures.
Additionally, though concerns about the facility’s utility provision had been raised as early as 2018, funding requests to replace the systems were not approved. With this in mind, Billy Durie, Global Sector Head – Data Centres at Aggreko, is highlighting how different strategies to equipment procurement during crises could prevent further incidents.
“The fact issues with these cooling systems were pointed out in advance demonstrates that identifying potential concerns is not an issue here,” Billy says. “Instead, where concerns arise is in what needs to be done, and what course of action can be taken within existing continuity plans.
“Namely, in an environment where there are delays to signing off capital expenditure on new, permanent equipment, there must be other contingencies available to guard against potential downtime. These backup plans need to be independent of existing infrastructure, which is coming under increasing pressure as extreme weather events such as last July’s heatwave become more common.”
When tackling issues caused by rising temperatures, Billy believes a rapid response is required to ensure problems caused by unplanned downtime are minimised. Specifically, he is imploring facility stakeholders to explore rapid response temporary hire options that can be used in emergencies such as July’s.
“When operating data centres, any unplanned downtime is rightly seen as unacceptable,” Billy concludes. “However, the worst can always occur, and when it does, every minute spent offline can lead to snowballing costs and reputational damage. Consequently, getting emergency utility provision to site as quickly as possible is vital, as demonstrated in this incident, where full operations were not restored for six weeks with considerable disruption for patients and staff. Data centre managers should therefore ensure they are fully aware of the options available, including pre-emptive loadbank testing, and work with equipment suppliers that can provide rapid response remedial services as an ongoing OpEx cost. By doing so, they can reduce facility downtime while also avoiding the CapEx constraints that indirectly led to this incident.”