Huawei and IEEE Kazakhstan Subsection Jointly Release HPC Lossless Ethernet and AI Fabric Network Technology White Paper
Huawei and IEEE Kazakhstan Subsection Jointly Release HPC Lossless Ethernet and AI Fabric Network Technology White Paper
[Almaty, June 5, 2023] During the 2023 Huawei Middle East and Central Asia Technology Carnival, Huawei successfully held the Digital Communication Innovation Summit. More than 480 customers and partners from Kazakhstan, Uzbekistan, Saudi Arabia, the United Arab Emirates, Qatar, Pakistan and other countries participated in this summit to discuss industry digital development and network technology innovation. At the meeting, Huawei, IEEE Kazakhstan Subsection, and Ankabut, the UAE Advanced National Research and Education Network, jointly released the "HPC L ossless Ethernet and AI Fabric Network Technology White Paper" (herein after referred to as the "White Paper") to the world. The White Paper Expounds The Extensive Application Prospects of Lossless Ethernet Data Center Networks in the Fields of hpc and ai, and expounds the latest techn ICAL Research and Commercial Practice Results from the Dimensions of Network Architecture, Key Technology, Business Value, and Best Practices.
Huawei, IEEE Kazakhstan Subsection and Ankabut released the "HPC Lossless Ethernet and AI Fabric Network Technology White Paper"
The White Paper Pacing Out that the Lossless Ethernet Technology has the Characteristics of Intelligent RDMA and Network-Level Load Balancing, Which Can Realize Zero Packet Loss Forwarding and 90% Ultra-High Throughput, and Form All-Round Advantages in Performance, Composition, COST -effectiveness and flexibility. Become an inevitable choice for high-performance computing. At the same time, countries around the world are actively issuing policies to support the development of HPC and AI. In the future, lossless Ethernet will play a key role in global digitalization.
The white paper first introduces the current high-performance computing network topology, including CLOS, MultiRail, and direct connection topology. Among them, CLOS is a multi-level architecture. At each level, each switching unit is connected to all switching uni ts at the next level, which can be strictly non-blocking, reconfigurable, and scalable; The cell exchange realizes absolute load balancing in the plane; the direct connection topology can realize ultra-large-scale networking, and has the characteristics of low cost and few end -to-end communication hops.
Secondly, it introduces the software architecture to improve the performance of HPC&AI applications from the optimization of the network itself and the integration and optimization of the network and application system. Among them, network self-optimization achieves the goal of highest throughput and lowest latency of the entire network through the following three aspects: The first is flow control technology, which solves the problem of PFC deadlock by identifying ring cache depend cies and breaking the necessary conditions for them to improve Network reliability; the second is congestion control, which dynamically adjusts the ECN threshold through AI algorithms to obtain maximum bandwidth and minimum delay; the third is traffic scheduling, which uses NSLB technology to solve the problem of uneven network load and achieve 90% high throughput. In order to achieve the result of increasing the efficiency of AI training by 20%. For the integration and optimization of network and application systems, the HPC network realizes computing optimization through online computing, that is, through the online aggregation computing characteristics of MPI communication, network devices participate in the computing process, reducing task com pletion time.
HPC Lossless Ethernet and AI Fabric Network Technology White Paper
The General Trend of the Current Social Development is HPC & AI for Everything. The Lossless Ethernet Network to build the Foundation for the Internet. Tion of All Things and the Interoperability of the Internet; ProVide Computing Power Services for Thousands of Industries, and Create a Solid High -performance computing base in the digital economy era; Contribute to the prosperity and development of advanced digital industries and help global digital transformation.