Artificial Intelligence in Data Centre Operations


Pure Storage accelerates enterprise AI adoption with NVIDIA
Pure Storage has announced new validated reference architectures for running generative AI use cases, including a new NVIDIA OVX-ready validated reference architecture. In collaboration with NVIDIA, the company is arming global customers with a proven framework to manage the high-performance data and compute requirements they need to drive successful AI deployments.  Building on this collaboration with NVIDIA, Pure Storage claims that it delivers the latest technologies to meet the rapidly growing demand for AI across today’s enterprises. New validated designs and proofs of concept include: Retrieval-Augmented Generation (RAG) pipeline for AI inference Certified NVIDIA OVX server storage reference architecture Vertical RAG development Expanded investment in AI partner ecosystem Industry significance: Today, the majority of AI deployments are dispersed across fragmented data environments — from the cloud to ill-suited (often legacy) storage solutions. Yet, these fragmented environments cannot support the performance and networking requirements to fuel AI data pipelines and unlock the full potential of enterprise data.  As enterprises further embrace AI to drive innovation, streamline operations, and gain a competitive edge, the demand for robust, high-performance, and efficient AI infrastructure has never been stronger. Pioneering enterprise AI deployments, particularly among a rapidly growing set of Fortune 500 enterprise customers, Pure Storage claims to provide a simple, reliable, and efficient storage platform for enterprises to fully leverage the potential of AI, while reducing the associated risk, cost, and energy consumption.

Vultr revolutionises global AI deployment with inference
Vultr has announced the launch of Vultr Cloud Inference. This new serverless platform revolutionises AI scalability and reach by offering global AI model deployment and AI inference capabilities. Today's rapidly evolving digital landscape has challenged businesses across sectors to deploy and manage AI models efficiently and effectively. This has created a growing need for more inference-optimised cloud infrastructure platforms with both global reach and scalability, to ensure consistent high performance. This is driving a shift in priorities as organisations increasingly focus on inference spending as they move their models into production. But with bigger models comes increased complexity. Developers are being challenged to optimise AI models for different regions, manage distributed server infrastructure, and ensure high availability and low latency. With that in mind, Vultr created cloud inference. Vultr Cloud Inference will accelerate the time-to-market of AI-driven features, such as predictive and real-time decision-making while delivering a compelling user experience across diverse regions. Users can simply bring their own model, trained on any platform, cloud, or on-premises, and it can be seamlessly integrated and deployed on Vultr’s global NVIDIA GPU-powered infrastructure. With dedicated compute clusters available in six continents, it ensures businesses can comply with local data sovereignty, data residency, and privacy regulations by deploying their AI applications in regions that align with legal requirements and business objectives. “Training provides the foundation for AI to be effective, but it's inference that converts AI’s potential into impact. As an increasing number of AI models move from training into production, the volume of inference workloads is exploding, but the majority of AI infrastructure is not optimised to meet the world’s inference needs,” says J.J. Kardwell, CEO of Vultr’s parent company, Constant. “The launch of Vultr Cloud Inference enables AI innovations to have maximum impact by simplifying AI deployment and delivering low-latency inference around the world through a platform designed for scalability, efficiency, and global reach.” With the capability to self-optimise and auto-scale globally in real-time, it ensures AI applications provide consistent, cost-effective, low-latency experiences to users worldwide. Moreover, its serverless architecture eliminates the complexities of managing and scaling infrastructure, delivering unparalleled impact, including: Flexibility in AI model integration and migration Reduced AI infrastructure complexity Automated scaling of inference-optimised infrastructure Private, dedicated compute resources “Demand is rapidly increasing for cutting-edge AI technologies that can power AI workloads worldwide,” says Matt McGrigg, Director of Global Business Development, Cloud Partners at NVIDIA. “The introduction of Vultr Cloud Inference will empower businesses to seamlessly integrate and deploy AI models trained on NVIDIA GPU infrastructure, helping them scale their AI applications globally.” As AI continues to push the limits of what’s possible and change the way organisations think about cloud and edge computing, the scale of infrastructure needed to train large AI models and to support globally distributed inference needs has never been greater. Vultr Cloud Inference is now available for early access via registration here.

Treske calls experts to bring technology skills to Australia
Australian data centre infrastructure partner, Treske, has called on AI and data centre industry experts to bring skills and education to Australia’s regional areas to help businesses across Australia capitalise on the AI era. “In non-metro areas, Australia’s technology skills gap is more than a local issue – it's a national challenge, which needs greater attention,” says Daniel Sargent, Managing Director and Founder of Treske. “By delivering education right to where it's needed, we’re aiming to help these regional communities thrive in the fast-paced digital economy, and ultimately, make sure the whole country benefits from the AI revolution – not just the major cities.” For its part, Treske will launch a first-of-its-kind event series where experts on AI critical infrastructure will come together to discuss regional preparedness for AI. The series will commence in Newcastle this March and explore the dynamic terrain of critical infrastructure and how it can be designed and built to bolster AI’s return on investment (ROI) in often under-resourced areas. “Regions typically feel like they need to go to city to be part of the cloud or AI. And although it’s promising to see more and more colocation data centres taking shape in rural areas, many are still missing the hybrid cloud approach, which requires on-premises infrastructure and on-the-ground skills availability,” continues Daniel. “Regional businesses want to adopt AI-powered tech – think internet-of-things (IoT) sensors in local council car parks or autonomous vehicles operating in mine sites or on farms – but they don’t have the data centre infrastructure available, nor close enough to the action. This means the efficiency and financial ROI these technologies are intended to yield aren’t being demonstrated, which is holding back businesses’ responsiveness to market demands.” According to BCG, 70% of Australian businesses are still yet to deliver their digital transformation efforts – a critical first step in effectively implementing AI. In fact, globally, 95% of organisations have an AI strategy in place, but only 14% are ready to fully integrate it into their businesses. Although there are several factors slowing adoption, AI ultimately requires secure, fast access to data to deliver results, which poses a complex challenge for critical infrastructure resilience and scalability. Joining Daniel on the panel are critical infrastructure experts Robert Linsdell, General Manager A/NZ and APAC at Ekkosense; Rob Steel, Channels and Projects Manager at Powershield; Mark Roberts, Asia Pacific IT Business Leader at Rittal; and Adam Wright, Director and Founder at Ecogreen Electrical and Plumbing. Specialists in their field, these individuals are motivated to ensure Australia’s regions aren’t being left out of education and development opportunities, and will deliver insights to the Newcastle and regional NSW audience: Power-ready AI adoption: Regional businesses should prepare themselves to grow alongside AI demands of today and tomorrow. This is a matter of resiliency, and being prepared in the regions demands on-premises energy efficient, reliable data centre infrastructure – the panel will offer insight into how this can be achieved with a pocketknife of racking, uninterruptible power supply (UPS) units, precision cooling and management systems. The panel will also discuss the need for government grants – resembling Australia’s solar power installation incentives – to lift local digital infrastructure resilience and readiness. Overcoming the far-flung skills challenge: In addition to the value of nearby industry education events, the panel will discuss how important it is to see tertiary skilled courses rolled out through state TAFEs, where there is generally more presence in the regions. The panel will also examine how robust infrastructure deployments can take pressure off local skills shortages. Remote AI wins: The panel will share examples of how AI with resilient infrastructure has proven to offer growth-changing outcomes in industries such as agriculture, local government, healthcare, mining, and manufacturing. On the back of the event, Treske is set to launch its UPS 101 guide – a handbook for IT resellers, managers and system administrators on questions to ask and issues to consider when deploying a UPS for their site or customer. “The UPS is often seen as a box of batteries, which reinstates IT power when grid power fails,” continues Daniel. “But it is so much more, and the wrong design can cost businesses thousands of dollars.”

Schneider Electric and NVIDIA pioneer designs for AI data centres
Schneider Electric has announced a collaboration with NVIDIA to optimise data centre infrastructure and pave the way for ground-breaking advancements in edge artificial intelligence (AI) and digital twin technologies. Schneider Electric will leverage its expertise in data centre infrastructure and NVIDIA’s advanced AI technologies to introduce the first publicly available AI data centre reference designs. These designs are set to redefine the benchmarks for AI deployment and operation within data centre ecosystems, marking a significant milestone in the industry's evolution. With AI applications gaining traction across industries, while also demanding more resources than traditional computing, the need for processing power has surged exponentially. The rise of AI has spurred notable transformations and complexities in data centre design and operation, with data centre operators working to swiftly construct and operate energy-stable facilities that are both energy-efficient and scalable. “We're unlocking the future of AI for organisations,” says Pankaj Sharma, Executive Vice President, Secure Power Division and Data Centre Business, Schneider Electric. “By combining our expertise in data centre solutions with NVIDIA's leadership in AI technologies, we’re helping organisations to overcome data centre infrastructure limitations and unlock the full potential of AI. Our collaboration with NVIDIA paves the way for a more efficient, sustainable, and transformative future, powered by AI.”  Data centre reference designs In the first phase of this collaboration, Schneider Electric will introduce data centre reference designs tailored for NVIDIA accelerated computing clusters and built for data processing, engineering simulation, electronic design automation, computer-aided drug design, and generative AI. Special focus will be on enabling high-power distribution, liquid cooling systems, and controls designed to ensure simple commissioning and reliable operations for the extreme-density cluster. Through the collaboration, the company also aims to provide data centre owners and operators with the tools and resources necessary to seamlessly integrate new and evolving AI solutions into their infrastructure, enhancing deployment efficiency, and ensuring reliable lifecycle operation. Addressing the evolving demands of AI workloads, the reference designs will offer a robust framework for implementing NVIDIA’s accelerated computing platform within data centres, while optimising performance, scalability, and overall sustainability. Partners, engineers, and data centre leaders can utilise these reference designs for existing data centre rooms that must support new deployments of high-density AI servers and new data centre builds that are fully optimised for a liquid-cooled AI cluster. “Through our collaboration with Schneider Electric, we’re providing AI data centre reference designs using next-generation NVIDIA accelerated computing technologies,” says Ian Buck, Vice President of Hyperscale and HPC at NVIDIA. “This provides organisations with the necessary infrastructure to tap into the potential of AI, driving innovation and digital transformation across industries.” Future roadmap In addition to the data centre reference designs, AVEVA, a subsidiary of Schneider Electric, will connect its digital twin platform to NVIDIA Omniverse, delivering a unified environment for virtual simulation and collaboration. This integration will enable seamless collaboration between designers, engineers, and stakeholders, accelerating the design and deployment of complex systems, while helping reduce time-to-market and costs. “NVIDIA technologies enhance AVEVA's capabilities in creating a realistic and immersive collaboration experience underpinned by the rich data and capabilities of the AVEVA intelligent digital twin,” says Caspar Herzberg, CEO of AVEVA. “Together, we are creating a fully simulated industrial virtual reality where you can simulate processes, model outcomes, and effect change in reality. This merging of digital intelligence and real-world outcomes has the potential to transform how industries can operate more safely, more efficiently and more sustainably.” In collaboration with NVIDIA, Schneider Electric plans to explore new use cases and applications across industries and further its vision of driving positive change and shaping the future of technology. More information will be available at Schneider Electric’s Innovation Summit, Paris on 3 April.

Five ways AI is transforming data centres
The tech landscape is undergoing a remarkable transformation. This is currently driven predominantly by advancements in AI, machine learning, IoT, quantum computing, automation, virtual reality (VR), augmented reality (AR), and cyber security. These advancements are bringing unprecedented opportunities for business growth and improved quality of life. However, they also pose wider operational challenges that must be addressed. This includes concerns over job displacement for many people, privacy concerns, and cyber security risks. Within this wider landscape, AI, in particular, is playing a significant role in transforming and improving how data centres operate.  With that in mind, Mark Grindey, CEO at Zeus Cloud, shares five ways that data centres can use developments in AI to their advantage to optimise efficiency, enhance performance, and streamline operations.  Optimising efficiency and performance Predictive maintenance: Data centres consist of numerous interconnected systems and equipment. AI algorithms can analyse real-time data from sensors and usage patterns to predict when equipment may fail or require maintenance. By identifying potential issues in advance, data centres can schedule maintenance tasks, minimise downtime, and reduce costs associated with unplanned outages. Energy efficiency: AI algorithms can monitor energy consumption patterns and optimise energy usage in data centres. By analysing data on workload demands, temperature, and power usage effectiveness (PUE), AI can identify areas where energy can be saved and provide insights for improving energy efficiency. This not only reduces operational costs but also contributes to environmental sustainability. Intelligent resource allocation: Data centre resources, such as servers, storage, and networking equipment, need to be allocated efficiently to handle varying workload demands. AI can analyse historical data, usage patterns, and performance metrics to optimise resource allocation in real-time. This ensures that resources are allocated dynamically, matching workload requirements, and reducing inefficiencies or over-provisioning. Enhanced security: Data centres store large volumes of sensitive and valuable data. AI-powered security systems can analyse network traffic, identify anomalies, and detect potential security threats or attacks. By continuously monitoring data traffic and patterns, AI can provide real-time threat detection, prevention, and response, enhancing the overall security posture of the data centre. Intelligent data management: With the exponential growth of data, data centres face the challenge of efficiently managing and processing large volumes of information. AI can help automate data management tasks such as data categorisation, classification, and retrieval. AI-powered data analytics can extract valuable insights from massive datasets, facilitating informed decision-making and improving operational efficiency. Conclusion By harnessing the power of AI, data centres can optimise their operations, improve efficiency, and provide better services to their customers. However, it is important to ensure that AI systems are implemented ethically, with appropriate oversight and safeguards in place. As AI technologies continue to evolve, the potential for innovation in data centres will continue to grow, enabling them to stay at the forefront of the ever-evolving tech landscape – all of which raises questions to end users around whether their data centre provider is making use of AI to not only improve the service they receive, but also to keep data secure. 

A new dawn for tech infrastructure: Building AI-ready data centres
By Darren Watkins, Chief Revenue Officer at VIRTUS Data Centres It’s no secret that the past year has seen an enormous surge in the influence of Artificial Intelligence (AI) and Machine Learning (ML). What was once considered a niche technology has now exploded into the mainstream, profoundly impacting every aspect of our lives – from how businesses function to enhancing personal productivity, and even influencing the way we interact with entertainment and navigate daily tasks. This explosion is having a knock-on effect on the infrastructure which underpins and powers our modern lives. Data centres, traditionally the backbone of many technological advancements, are now faced with the imperative to do more than ever to support data storage management and retrieval, and cloud services in an always-on manner. The rapid growth of AI highlights the pressing need for data centres to be even more agile, innovative, and collaborative – driving this new era. In meeting this demand, data centre operators are swiftly adapting their facilities to accommodate the unprecedented requirements of AI and ML workloads. This entails not only scaling up capacity but also implementing advanced technologies, such as liquid cooling systems and optimised power distribution architectures, to ensure optimal performance and energy efficiency. It’s important to note that achieving 'AI readiness' goes beyond technological functionality - it hinges on the imperative of early engagement with those customers who need AI ready infrastructure. This strategic engagement not only ensures a symbiotic relationship, but also serves as the linchpin for developing a truly flexible and customised infrastructure that can seamlessly evolve with the fast-growing and ever-changing technological landscape. So, how are data centre operators rising to the challenge of this new era of demands support AI and ML? A new approach to location In the past, network technology infrastructure was carefully planned to reduce latency and data processing speeds. However, with the rise of AI and ML workloads, this approach is changing. Unlike other types of data processing that require low latency, AI and ML tasks have different priorities. This shift in focus is leading to a reconsideration of what makes an ideal location for data centres. Now, there's a growing preference for larger more efficient campuses that can generate between 200 and 500MW of power and have access to renewable energy sources. This change in strategy represents a departure from the previous emphasis on reducing latency. Instead, it reflects a broader understanding of how AI and ML are integrated into technology systems. The move toward larger campuses isn't just about accommodating less latency-sensitive tasks. It's a deliberate decision that takes into account the costs and benefits of operating at a larger scale. By prioritising bigger campuses, data centre providers can often achieve greater efficiency, both in terms of cost and sustainability. This shift challenges the traditional idea that proximity to users is always the most important factor for data centres. Instead, it suggests that focusing on size and efficiency can lead to better overall outcomes. Beyond size, the role of edge computing remains important. A fully integrated AI solution requires connectivity to all aspects of a business's systems, and whilst core language models and inference models may reside in mega-scale campuses, there is an ongoing need for edge solutions in metropolitan cities, ensuring full integration. Additionally, for some companies, edge data centre solutions are essential for cost-effectiveness. For example, content distribution networks delivered via local edge data centres facilitate seamless iOS upgrades for iPhones, negating the need for individual data centres in every country. Defying labels: mega-scale and the edge It is clear that AI and ML are changing data centre requirements and it’s often the case that bigger is better. But what will the new generation of data centres be called - hyperscale 2.0, megascale, gigascale, or something else? Whatever the label ends up being, it’s important to remember that "hyperscale" isn't merely about physical size; it's now a reflection of the specific customer it refers to. The term, "mega-scale campuses to host hyperscale customers”, might define the ongoing industry transformations more accurately. Regardless of the terminology, one common challenge is evident; meeting the significant capacity demands of these customers. The current limitations of European hyperscale facilities to address the growing AI market underscore this challenge, and mega-scale campuses may be the answer. VIRTUS’ 200MW campus in Wustermark, Berlin, (under construction) is a great example of large-scale, sustainable facilities being built that are AI ready, and prepared to meet these future cloud, hyperscale and customer demands. The increasing importance of sustainability Sustainability plays a critical role in shaping the future of data centres, especially in the context of the rapid integration of AI and ML technologies. As these advanced workloads continue to drive demand for computational power and data storage, data centre operators are increasingly realising the importance of reducing environmental impact. This means not only optimising energy efficiency but also embracing renewable energy sources like solar and wind power to meet the growing energy demands sustainably. In this evolving landscape, the emphasis on sustainability isn't just a buzzword; it's a strategic imperative that aligns with the broader goals of AI and ML integration. By prioritising environmentally conscious practices, data centres can support the scalability and reliability required for AI and ML workloads while minimising their carbon footprint. This holistic approach ensures that as AI and ML reshape industries and drive innovation, they do so in a way that is both technologically advanced and environmentally responsible. VIRTUS understands the dual responsibility of meeting the demands of AI and ML workloads while mitigating their environmental impact. That's why we're committed to sustainability, working tirelessly to innovate and reduce our carbon emissions. Our recent strides towards achieving net zero carbon emissions by 2030, as showcased in our sustainability report, underscore our dedication to building a greener future while powering the advancements of AI and ML technologies. A road to innovation Facing unprecedented technological advancement, data centres continue to be more than mere facilities; they remain the bedrock infrastructure upon which the digital future is built. And with AI and ML driving the next wave of innovation, the role of data centres is becoming even more vital. Being dynamic, innovating providers are not only key to shaping a more intelligent and sustainable digital world, but provide the development and investment made in delivering new technologies for customers and consumers, whilst increasing productivity and remaining committed to increasingly greater environmentally sustainable facilities.

AddOn Networks new transceiver range delivers optimal performance
A new range of 800G transceivers has been launched by AddOn Networks to enable Artificial Intelligence (AI) and Machine Learning (ML) tools and greater automation in data centre applications. Supporting both InfiniBand and Ethernet, the range has been designed to offer customers an attractive alternative to Network Equipment Manufacturers (NEMs) when it comes to building and maintaining enhanced optical networks. Data centre operators need extensive data, storage and compute capabilities to ensure the enhanced speeds and low latency required to overcome growing industry demands. AI and ML tools optimise existing infrastructure and establish accurate data analytics for faster decision-making and increased automation. Yet, the amount of bandwidth necessary for these to operate successfully, and the lack of compatible solutions in the market, makes selecting the right product a challenge. With this product launch, AddOn Networks will provide operators with the means to optimise their existing networks, alongside a premium support service and a reduction in part lead times. “This family of transceivers mark an exciting new era for AddOn Networks,” says AddOn Networks’ Director of Product Line Management, Ray Hagen. “Businesses operating in the data centre industry may already be aware of the benefits of 800G transceivers, but when ordering these directly through NEMs, they often experience long lead times and unnecessary delays in delivery. With our new range of transceivers, we will compress the timeline from order to delivery to ensure customers get solutions exactly when they require them. As a result, operators can maximise data centre output through AI and ML tools while making essential cost and time savings.” The transceiver range will enhance the handling and processing of bandwidth-intensive data flows when migrating from serial Central Processing Unit (CPU) based architecture towards parallel data flow in the Graphics Processing Unit (GPU). Once implemented within existing infrastructure, the transmission of data is accelerated, with additional capabilities to increase storage and compute capacity available to operators. “Our 800G transceivers are parallel tested in our customers’ environments to ensure performance matches what is offered by the NEMs,” continues Ray. “Our global leadership in third-party optics and expertise in testing means we can offer our customers not just a best-in-class transceiver, but a best-in-class service too. Our round-the-clock support puts customers in the best possible position to meet the growing demands of the industry. This launch reflects our commitment to introducing key solutions through multiple platforms to best serve those customers, as we move into the realm of AI and ML.” AddOn Networks carries out 100% testing in its laboratory to guarantee the family of transceivers mirror the customers’ data flows and their front-end environments to ensure full compatibility and performance while offering a lifetime warranty. As a result, customers can benefit from adaptable and reliable products, tailored to meet the specific demands of their data centre. More information on the products can be found on the AddOn website.

What will AI computing look like?
By Ed Ansett, Founder and Chairman, i3 Solutions Group Wherever generative AI is deployed, it will change the IT ecosystem within the data centre. From processing to memory, networking to storage, and systems architecture to systems management, no layer of the IT stack will remain unaffected. For those on the engineering side of data centre operations tasked with providing the power and cooling to keep AI servers operating both within existing data centres and in dedicated new facilities, the impact will be played out over the next 12 to 18 months. Starting with the most fundamental IT change - and the one that has been enjoying the most publicity - AI is closely associated with the use of GPUs (Graphics Processing Units). The GPU maker, Nvidia, has been the greatest beneficiary, according to Reuters, “Analysts estimate that Nvidia has captured roughly 80% of the AI chip market. Nvidia does not break out its AI revenue, but a significant portion is captured in the company's data centre segment. So far in 2023, Nvidia has reported data centre revenue of $29.12bn.” But even within the GPU universe, it will not be a case of one size or one architecture fits every AI deployment in every data centre. GPU accelerators built for HPC and AI are common, as are Field Programmable Gate Arrays, adaptive System on Chips (SOCs) or ‘smart mics’ and highly dense CUDA (Compute Unified Device Architectures) GPUs. An analysis from the Centre for Security and Emerging Technology, entitled, AI chips, what they are and why they matter, says, “Different types of AI chips are useful for different tasks. GPUs are most often used for initially training and refining AI algorithms. FPGAs are mostly used to apply trained AI algorithms to "inference". ASICs can be designed for either training or inference.” As with all things AI, what’s happening at chip level is an area of rapid development where there is growing competition between with traditional chip makers, cloud operators and new market entrants who are racing to produce chips for their own use, for the mass market or both. As an example of disruption in the chip market, AWS announced ‘Trainium2’ as a next generation chip designed for training AI systems in Summer 2023. The company proclaimed the new chip to be four times faster while using half the energy of its predecessor. Elsewhere, firms such as ARM are working with cloud providers to produce chips for AI, while AMD has invested billions of dollars in AI chip R&D. Intel, the world’s largest chip maker is not standing still. Its product roadmap announced in December 2023 was almost entirely focused on AI processors from PCs to servers. Why more GPU servers? The reason for the chip boom is the sheer number of power-hungry GPUs (Graphics Processing Units) or TPUs (Tensor Processing Units were developed by Google specifically for AI workloads) needed for generative AI workloads. A single AI model will run across hundreds of thousands of processing cores in ten of thousands of servers mounted in racks drawing 60-100kW per rack. As AI use scales and expands, this kind of rack power density will be common. The power and cooling implications for data centres are clear. There are several factors that set GPU servers apart from other types of server. According to Run:ai, these include, “Parallelism: GPUs consist of thousands of small cores optimised for simultaneous execution of multiple tasks. This enables them to process large volumes of data more efficiently than CPUs with fewer but larger cores.“Floating-point performance: The high-performance floating-point arithmetic capabilities in GPUs make them well-suited for scientific simulations and numerical computations commonly found in AI workloads. “Data transfer speeds: Modern GPUs come equipped with high-speed memory interfaces like GDDR6 or HBM2 which allow faster data transfer between the processor and memory compared to conventional DDR4 RAM used by most CPU-based systems.” Parallel processing – AI computing and AI supercomputing Like AI supercomputing, traditional supercomputing runs on parallel processing using neural networks. Parallel processing is using more than one microprocessor to handle separate parts of an overall task. It was first used in traditional supercomputers – machines where vast arrays of traditional CPU servers with 100,000s of processors are set up as a single machine within a standalone data centre. Because GPUs were invented to handle graphics rendering their chip parallel architecture makes them more suitable for breaking down complex tasks and working on them simultaneously. It is the nature of all AI that large tasks need to be broken down. The announcements about AI supercomputers from the cloud providers and other AI companies are revealing: Google is saying it will build for customers its AI A3 supercomputers of 26,000 GPUs, pushing 26 exaFLOPS of AI throughput. AWS said it will build GPU clusters called Ultrascale, that will deliver 20 exaFLOPS. Inflection AI said its AI cluster will consist of 22,000 NVIDIA H100 Tensor Core GPUs. With such emphasis on GPU supercomputers, you could be forgiven for thinking all AI will only run on GPUs in the cloud. In fact, AI will reside not just in the cloud but across all types of existing data centres and on different types of server hardware. Intel points out, “Bringing GPUs into your data centre environment is not without challenges. These high-performance tools demand more energy and space. They also create dramatically higher heat levels as they operate. These factors impact your data centre infrastructure and can raise power costs or create reliability problems.” Of course, Intel wants to protect its dominance in the server CPU market. But the broader point is that data centre operators must prepare for an even greater mix of IT equipment living under the one roof. For data centre designers, even where being asked to accommodate running several thousand GPU machines at relatively small scale within an existing facility, be prepared to find more power and take more heat.

How AI is changing data centre infrastructure
By Geoff Dear, Technical Manager UK & Ireland, and Fatholah Zaki, LAN & Data Centre Sales Manager, Reichle & De-Massari UK AI is significantly influencing data centre design and operation in several ways. What’s more, it can be used to drive data centre efficiencies, enhance performance, and introduce new capabilities. Infrastructure and customer requirements are changing with the need to support the vast global uptake of AI. Making well-informed decisions about data centre infrastructure to accommodate this is vital, and choices made now will affect performance and business for years to come! According to Forbes Advisor, over 60% of business owners believe AI will increase productivity and improve customer relationships. Anticipated growth for Europe’s AI software market between 2021 and 2028 is an impressive 40%. According to a report from Accenture, AI could add an additional USD$814bn (£630bn) to the UK economy by 2035. The AI as a Service (AIaaS) market, forecasted by Research and Markets, is predicted to expand from $9.86bn in 2023 to $14.27bn in 2024 (a remarkable compound annual growth rate of 44.7%) and reach $63.2bn by 2028. Enabling migration from CPUs to GPUs To support AI usage, data centres require specialised hardware for efficiently processing highly complex computations: GPUs (Graphics Processing Units). These consist of hundreds or thousands of smaller cores that are optimised for handling multiple tasks simultaneously. As the name suggests, GPUs were originally designed to render graphics and carry out image processing tasks. However, their capabilities have expanded over the years. GPUs are currently used for general-purpose computing tasks that require significant parallel processing power, such as data analysis and machine learning. Compared to ‘traditional’ CPU-based installation, GPU racks have significantly more cores than CPUs, consume more power, emit more heat, and occupy more space. In fact, an anticipated 10-fold increase in the number of processors in the same footprint. As a result, solutions need to be developed to enable connectivity for the vastly increasing numbers of in GPUs. Integrated very small form factor (VSFF) connectors can play a very significant part in solving this challenge. VSFF SN and CS connectors are characterised by their small size but huge capacity for fibre connections. They are developed to allow for port breakout at high speeds, such as 400G, which is becoming increasingly common in data centre environments, as networks expand to support larger volumes of data transmission. Use of VSFF connectors can increase the fibre count to 432 in a 1U space - a significant leap from the 144 fibres accommodated by LC duplex connectors. Using these connectors in data centres allows for the deployment of a significantly larger number of GPUs by maximizing the use of available space and improving the efficiency of connections. Power and cooling Integration of AI accelerators also necessitates changes in rack designs and power distribution. Data centre designers need to work out kilowatt requirements per rack and where fluctuations might occur. Additional power and cooling systems are needed to handle higher power densities and thermal loads. It will also be important to determine where air cooling will suffice, and where liquid cooling can bring the greatest benefit. Liquid can conduct heat more effectively than air and is suited to growing data centre equipment densities. There are several ways of applying liquid cooling - from heat exchangers in rack doors to immersion cooling systems that completely submerge rack servers and other components in a dielectric liquid that doesn’t conduct electricity - but does conduct heat. This is the most energy-efficient and sustainable form of liquid cooling, making optimum use of liquid’s thermal transfer properties. Heat can also be reused. However, it’s important to consider the fact that this approach affects connectivity, as well as the costs, requirements and skillsets involved. The increased power and cooling requirements driven by AI can boost carbon emissions, while at the same time data centres everywhere are trying to get a firmer grip on their PUE. That makes monitoring, mitigation and process optimisation more important than ever. When it comes to power and cooling, AI introduces a challenge – but also provides part of the solution. Incorporating AI into data centre asset management can enhance resource utilisation and decision-making. Data centre design and building can be optimised using ‘Digital Twins’ and simulation modelling. AI algorithms and overlays displaying real-time energy and efficiency metrics can optimise cooling, power, and resource allocation. AI can analyse usage patterns and historical data to predict future asset needs, enabling better planning for procurement and replacement. However, successful implementation requires careful analysis, planning and integration with existing systems. Holistic approach It’s important that redesign is taken up as a cross-departmental effort, avoiding technology siloes. A holistic approach looking at every part of the data centre and its unique requirements is vital. What’s more, flexibility is key. A modular approach will help accommodate future hardware and capacity requirements which are impossible to predict right now. A well-designed structured cabling system increases uptime, scalability, and return on investment while reducing the technology footprint and operating expenses. This ensures all connections are standardised and organised, which is vital for maintaining the high performance and reliability required by GPU-intensive operations. Structured cabling, characterised by adherence to predefined standards with pre-set connection points and pathways, supports efficient system growth and change management, crucial for evolving GPU requirements with enormous throughput requirements. R&M will present its data centre offering at the Data Centre World in London. Read more here.

DataVita launches new service in response to exponential AI demand
DataVita, Scotland’s data centre and cloud services provider, has invested in new infrastructure and capabilities to host the high-performance computing (HPC) workloads, supporting the exponential growth of artificial intelligence (AI) and machine learning. This marks a significant milestone, allowing for the hosting of high-density workloads at DataVita's DV1 facility in Lanarkshire. The facility now boasts the capacity to accommodate up to 100kW per rack for air cooling and up to 400kW per rack for liquid cooling. This enhancement significantly exceeds the capabilities of standard racks, providing essential support for the requirements of HPC, and represents a major leap forward for the Scottish data centre market. According to the US International Trade Administration, the UK’s AI market is currently valued at over £16.9bn and it is estimated to add £803.7bn to the UK economy by 2035. Alongside the accelerated adoption of generative AI models such as ChatGPT over the last year, DataVita says it has witnessed a huge surge in the volume of enquiries for high-capacity hosting and is already in talks with a number of globally significant tech providers. The higher proportion of renewable sources in Scotland's energy mix means there is a much lower carbon footprint associated with hosting data centres in the country compared to the rest of the UK and other nations. Relocating a 200-rack facility from London to Scotland would save over 6 million kgCO2e, equivalent to over 14 million miles driven by the average mid-sized car. Compared to Poland, it would reduce carbon emissions by 99%. Scotland generated more renewable power than it used for the first time in 2022, and therefore, has plenty of capacity to host the data needs of AI. Organisations could also save up to 70% on their data centre costs because of factors such as the country’s natural climate, which reduces the need for additional cooling.  Danny Quinn, MD of DataVita, says, “AI is one of the fastest growing sectors of technology and could have huge benefits for businesses, as well as public services and the well-being of citizens who use them. However, to support its widespread use we need to have the infrastructure in place to underpin the advanced computing power and data it requires. “While other European nations are struggling with power and capacity, Scotland has a surplus of renewable energy that could be used to power this new and exciting technology that everyone is talking about. We see a big opportunity tied to the growing global demand, which is why we have redesigned elements of our DV1 facility to match the needs of AI and HPC providers. “The location is ideal for companies aiming to reduce the carbon footprint of their IT provision while maintaining unmatched resilience, security, power and connectivity. By using Scotland’s natural resources and existing renewable energy infrastructure, we are proving that increasing AI data workloads does not need to come at the expense of the environment.”



Translate »