AWS announces new data centre components
Amazon Web Services (AWS) has announced new data centre components designed to support the next generation of artificial intelligence innovation and customers’ evolving needs.
These capabilities combine innovations in power, cooling, and hardware design to create a more energy efficient data centre that will underpin further customer innovation. These new capabilities will be implemented globally in AWS’s new data centres, and many components are already deployed in its existing data centres.
“AWS continues to relentlessly innovate its infrastructure to build the most performant, resilient, secure, and sustainable cloud for customers worldwide,” says Prasad Kalyanaraman, Vice President of Infrastructure Services at AWS. “These data centre capabilities represent an important step forward with increased energy efficiency and flexible support for emerging workloads. But what is even more exciting is that they are designed to be modular, so that we are able to retrofit our existing infrastructure for liquid cooling and energy efficiency to power generative AI applications and lower our carbon footprint.”
AWS has been building large-scale data centres for 18 years and GPU-based servers for AI workloads for 13 years. Today, AWS’s data centres support millions of active customers worldwide, including hundreds of thousands of customers using AWS AI and machine learning services, and tens of thousands of global customers using Amazon Bedrock to build their generative AI applications. As use of generative AI continues to grow and GPU capacity demands increase, AWS data centres are adapting to support increasingly higher power densities.
Key improvements include:
1. Simplified electrical and mechanical design for high availability
AWS continuously focuses on offering customers the most reliable infrastructure. Simplified electrical and mechanical designs are more reliable and easier to maintain, ensuring that customers enjoy the benefits of high reliability that AWS has offered from the beginning.
AWS’s latest data centre design improvements include simplified electrical distribution and mechanical systems, which enable infrastructure availability of 99.9999%. The simplified systems also reduce the potential number of racks that can be impacted by electrical issues by 89%.
In a data centre, electricity goes through multiple conversion and distribution systems before reaching the IT equipment. Each step naturally introduces inefficiency, energy loss, and potential failure points. As one new design example, AWS simplified the electrical distribution and in doing so, reduced the number of potential failure points by 20%. Other examples of simplifications include bringing backup power closer to the rack and reducing the number of fans that are used to exhaust hot air. AWS is using the natural pressure differential to exhaust hot air, which improves the amount of electricity available for servers. All of these changes help reduce overall energy consumption while minimising the risk of failures.
2. Innovations in cooling, rack design, and control systems
AWS has built a number of new and enhanced capabilities to offer customers the most performant, highly available, and energy efficient infrastructure possible. New data centre innovations include:
Liquid cooling: Newer AI servers benefit from liquid cooling to more efficiently cool high density compute chips. AWS has developed novel mechanical cooling solutions providing configurable liquid-to-chip cooling in both its new and existing data centres. Some AWS technologies utilise network and storage infrastructure that does not require liquid cooling, so updated cooling systems will seamlessly integrate air and liquid cooling capabilities for the most powerful AI chipsets, like AWS Trainium2 and rack-scale AI supercomputing solutions like NVIDIA GB200 NVL72 - as well as AWS’s network switches and storage servers. This flexible, multimodal cooling design allows AWS to provide maximum performance and efficiency at the lowest cost, whether running traditional workloads or AI models. The unique liquid cooling rack design was developed in collaboration with leading chip manufacturers to accelerate time to market for AI workloads.
Support for high-density AI workloads: AWS is maximising how power is used by optimising how it positions racks in a data centre. This was achieved through software, powered by data and generative AI, that predicts the most efficient way to land servers. AWS will now reduce the amount of stranded power – energy that is available but unused or underutilised – and make more efficient use of the energy available.
This design will support the next generation of hardware and high-density racks required for AI workloads, but is flexible enough to accommodate a wide range of other hardware types. AWS infrastructure offers the broadest and deepest compute platform with more than 750 Amazon Elastic Cloud Compute (Amazon EC2) instances, giving customers the choice of the latest processor, storage, networking, operating system, and purchase model for any workload. In addition to the flexible multimodal cooling design, AWS has developed engineering innovations in its power delivery systems, which enable AWS to support a six-times increase in rack power density over the next two years, and another three-times increase in the future. This is delivered in part by a new power shelf, which efficiently delivers data centre power throughout the rack, reducing electrical conversion losses.
Taken together, these innovations enable AWS to deliver 12% more compute power per site for customer workloads. These changes will reduce the overall number of data centres needed to deliver the same amount of compute capacity.
Updated control systems: The rollout of an Amazon-owned control system across AWS’s electrical and mechanical devices provides the ability to standardise monitoring, alarming, and operational sequences. For example, AWS’s internally built telemetry tools use AWS technologies to provide real-time diagnostics and troubleshooting services, both of which enable AWS to maintain optimal operating conditions on behalf of customers. In addition, AWS has increased the redundancy in its controls systems, while reducing complexity. These benefits result in AWS designing for infrastructure availability of 99.9999%.
3. Increased energy efficiency and sustainability, including 46% reduction in mechanical energy consumption and 35% reduction in embodied carbon in the concrete used
For many years, AWS has been a pioneer in improving energy efficiency and sustainability across its infrastructure. Research estimates AWS’s infrastructure is currently up to 4.1 times more efficient than on-premises infrastructure, and when workloads are optimised on AWS, the associated carbon footprint can be reduced by up to 99%. In 2023, Amazon achieved its goal to match all of the electricity consumed by its operations with 100% renewable energy – seven years ahead of its 2030 goal.
AWS continuously reevaluates how its data centres operate and determines ways to help its infrastructure use energy more efficiently through ongoing innovation. The new components include the following upgrades for energy efficiency and sustainability:
• A more efficient cooling system that is expected to reduce mechanical energy consumption by up to 46% compared to its previous design during peak cooling conditions, without increasing water usage on a per-megawatt basis. Design changes include a new single-sided cooling system, reduction in cooling equipment, and introduction of liquid cooling capabilities.
• Reduction of embodied carbon in the concrete of the data centre building shell by up to 35%, compared to industry average. AWS is adopting specifications for lower-carbon steel and concrete, and optimising the structural design to use less steel overall.
• Backup generators will be able to run on renewable diesel, a biodegradable and non-toxic fuel that can reduce greenhouse gas emissions by up to 90% over the fuel’s lifecycle when compared to fossil diesel. AWS has already started transitioning to renewable diesel to power backup generators at existing data centres in Europe and in America.
For more from AWS, click here.
Simon Rowley - 3 December 2024