How AI is changing data centre infrastructure

Author: Isha Jain

By Geoff Dear, Technical Manager UK & Ireland, and Fatholah Zaki, LAN & Data Centre Sales Manager, Reichle & De-Massari UK

AI is significantly influencing data centre design and operation in several ways. What’s more, it can be used to drive data centre efficiencies, enhance performance, and introduce new capabilities. Infrastructure and customer requirements are changing with the need to support the vast global uptake of AI. Making well-informed decisions about data centre infrastructure to accommodate this is vital, and choices made now will affect performance and business for years to come!

According to Forbes Advisor, over 60% of business owners believe AI will increase productivity and improve customer relationships. Anticipated growth for Europe’s AI software market between 2021 and 2028 is an impressive 40%. According to a report from Accenture, AI could add an additional USD$814bn (£630bn) to the UK economy by 2035. The AI as a Service (AIaaS) market, forecasted by Research and Markets, is predicted to expand from $9.86bn in 2023 to $14.27bn in 2024 (a remarkable compound annual growth rate of 44.7%) and reach $63.2bn by 2028.

Enabling migration from CPUs to GPUs

To support AI usage, data centres require specialised hardware for efficiently processing highly complex computations: GPUs (Graphics Processing Units). These consist of hundreds or thousands of smaller cores that are optimised for handling multiple tasks simultaneously.

As the name suggests, GPUs were originally designed to render graphics and carry out image processing tasks. However, their capabilities have expanded over the years. GPUs are currently used for general-purpose computing tasks that require significant parallel processing power, such as data analysis and machine learning. Compared to ‘traditional’ CPU-based installation, GPU racks have significantly more cores than CPUs, consume more power, emit more heat, and occupy more space. In fact, an anticipated 10-fold increase in the number of processors in the same footprint.

As a result, solutions need to be developed to enable connectivity for the vastly increasing numbers of in GPUs. Integrated very small form factor (VSFF) connectors can play a very significant part in solving this challenge. VSFF SN and CS connectors are characterised by their small size but huge capacity for fibre connections. They are developed to allow for port breakout at high speeds, such as 400G, which is becoming increasingly common in data centre environments, as networks expand to support larger volumes of data transmission.

Use of VSFF connectors can increase the fibre count to 432 in a 1U space – a significant leap from the 144 fibres accommodated by LC duplex connectors. Using these connectors in data centres allows for the deployment of a significantly larger number of GPUs by maximizing the use of available space and improving the efficiency of connections.

Power and cooling

Integration of AI accelerators also necessitates changes in rack designs and power distribution. Data centre designers need to work out kilowatt requirements per rack and where fluctuations might occur. Additional power and cooling systems are needed to handle higher power densities and thermal loads. It will also be important to determine where air cooling will suffice, and where liquid cooling can bring the greatest benefit. Liquid can conduct heat more effectively than air and is suited to growing data centre equipment densities.

There are several ways of applying liquid cooling – from heat exchangers in rack doors to immersion cooling systems that completely submerge rack servers and other components in a dielectric liquid that doesn’t conduct electricity – but does conduct heat. This is the most energy-efficient and sustainable form of liquid cooling, making optimum use of liquid’s thermal transfer properties. Heat can also be reused. However, it’s important to consider the fact that this approach affects connectivity, as well as the costs, requirements and skillsets involved.

The increased power and cooling requirements driven by AI can boost carbon emissions, while at the same time data centres everywhere are trying to get a firmer grip on their PUE. That makes monitoring, mitigation and process optimisation more important than ever. When it comes to power and cooling, AI introduces a challenge – but also provides part of the solution.

Incorporating AI into data centre asset management can enhance resource utilisation and decision-making. Data centre design and building can be optimised using ‘Digital Twins’ and simulation modelling. AI algorithms and overlays displaying real-time energy and efficiency metrics can optimise cooling, power, and resource allocation. AI can analyse usage patterns and historical data to predict future asset needs, enabling better planning for procurement and replacement. However, successful implementation requires careful analysis, planning and integration with existing systems.

Holistic approach

It’s important that redesign is taken up as a cross-departmental effort, avoiding technology siloes. A holistic approach looking at every part of the data centre and its unique requirements is vital. What’s more, flexibility is key. A modular approach will help accommodate future hardware and capacity requirements which are impossible to predict right now.

A well-designed structured cabling system increases uptime, scalability, and return on investment while reducing the technology footprint and operating expenses. This ensures all connections are standardised and organised, which is vital for maintaining the high performance and reliability required by GPU-intensive operations. Structured cabling, characterised by adherence to predefined standards with pre-set connection points and pathways, supports efficient system growth and change management, crucial for evolving GPU requirements with enormous throughput requirements.

R&M will present its data centre offering at the Data Centre World in London. Read more here.