Huawei

New paradigms for AI infrastructure

Heiko Joerg Schick

Sep 19, 2025 • 3 min read

I am proud to share that at Huawei Connect 2025 in Shanghai, we launched a new SuperPoD generation, raising the bar for AI infrastructure. Recognising the changes catalysed by DeepSeek‑R1, we enhanced our AI accelerators for higher efficiency and built upon extensive customer feedback to meet real-world needs.

Please take a look at the more below.

𝗖𝗵𝗶𝗽 𝗿𝗼𝗮𝗱𝗺𝗮𝗽: Three new Ascend series over the next three years—Ascend 950, 960, 970—evolving usability, data formats, interconnect, and bandwidth on a near annual cadence, targeting compute doubling each generation.
𝗔𝘀𝗰𝗲𝗻𝗱 𝟵𝟱𝟬 𝘀𝗲𝗿𝗶𝗲𝘀: Two variants on a common die—Ascend 950PR (prefill and recommendation) and Ascend 950DT (decode and training); added low-precision formats FP8, MXFP8, MXFP4, and HiF8 (Huawei format with FP16-like precision at FP8-like efficiency); vector advances via combined single instruction multiple data (SIMD) and single instruction multiple threads (SIMT), finer 128-byte memory access, and 2 TB/s interconnect.
𝗔𝘀𝗰𝗲𝗻𝗱 𝟵𝟱𝟬𝗣𝗥 𝗮𝘃𝗮𝗶𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆: 2026-Q1 in card and SuperPoD server form factors; paired with HiBL 1.0 high bandwidth memory (HBM) for cost-effective, compute-intensive prefill and recommendation.
𝗔𝘀𝗰𝗲𝗻𝗱 𝟵𝟱𝟬𝗗𝗧 𝗮𝘃𝗮𝗶𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆: 2026-Q4; paired with HiZQ 2.0 HBM delivering 144 GB and 4 TB/s memory bandwidth; supports FP8, MXFP8, MXFP4, and HiF8; 2 TB/s interconnect.
𝗔𝘀𝗰𝗲𝗻𝗱 𝟵𝟲𝟬 𝗮𝗻𝗱 𝟵𝟳𝟬: Ascend 960 (2027-Q4) doubles compute, memory bandwidth/capacity, and interconnect ports vs 950; adds HiF4 (Huawei 4-bit) for higher-precision FP4-class inference. Ascend 970 (2028-Q4) targets 2× FP4/FP8 compute vs 960, ≥1.5× memory bandwidth, and 4 TB/s interconnect.
𝗖𝘂𝗿𝗿𝗲𝗻𝘁 𝗦𝘂𝗽𝗲𝗿𝗣𝗼𝗗 𝗯𝗮𝘀𝗲𝗹𝗶𝗻𝗲: Atlas 900 A3 SuperPoD (up to 384 Ascend 910C) delivers up to 300 PFLOPS (petaflops) and has >300 deployments across >20 customers; underpinning the CloudMatrix384 cloud instance.
𝗔𝘁𝗹𝗮𝘀 𝟵𝟱𝟬 𝗦𝘂𝗽𝗲𝗿𝗣𝗼𝗗: Up to 8,192 Ascend 950DT NPUs (neural processing units), 160 cabinets over 1,000 m2, all‑optical interconnect; 8 EFLOPS (exaflops) FP8 and 16 EFLOPS FP4 with 16 PB/s (petabytes per second) interconnect; 2026-Q4 availability; training throughput 4.91 million tokens/s and inference 19.6 million tokens/s.
𝗔𝘁𝗹𝗮𝘀 𝟵𝟲𝟬 𝗦𝘂𝗽𝗲𝗿𝗣𝗼𝗗: Up to 15,488 Ascend 960 chips, 220 cabinets over 2,200 m2; 30 EFLOPS FP8 and 60 EFLOPS FP4, 4,460 TB memory, and 34 PB/s interconnect; training 15.9 million tokens/s and inference 80.5 million tokens/s; 2027-Q4 availability.
𝗚𝗲𝗻𝗲𝗿𝗮𝗹-𝗽𝘂𝗿𝗽𝗼𝘀𝗲 𝗰𝗼𝗺𝗽𝘂𝘁𝗶𝗻𝗴: Kunpeng 950 processor (2026-Q1) in 96c/192t and 192c/384t variants with four-layer confidential computing; further Kunpeng models in 2028-Q1: a high-performance 96c/192t SKU (>50% per-core uplift) and a high-density ≥256c/512t SKU.
𝗧𝗮𝗶𝗦𝗵𝗮𝗻 𝟵𝟱𝟬 𝗦𝘂𝗽𝗲𝗿𝗣𝗼𝗗: World’s first general-purpose computing SuperPoD (up to 16 nodes, 32 processors, 48 TB memory; memory/SSD/data processing unit (DPU) pooling); with GaussDB multi-write, delivers 2.9× performance without application changes; alternatives to mainframes, mid-range systems, and Oracle Exadata; boosts memory utilisation by 20% (virtualisation) and Spark real-time processing by 30%; 2026-Q1 availability.
𝗛𝘆𝗯𝗿𝗶𝗱 𝗦𝘂𝗽𝗲𝗿𝗣𝗼𝗗 𝗳𝗼𝗿 𝗿𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻: Combining TaiShan 950 and Atlas 950 SuperPoDs to form an ultra-large shared memory pool for petabyte-scale embeddings and ultra-low-latency inference for generative recommendation systems.
𝗜𝗻𝘁𝗲𝗿𝗰𝗼𝗻𝗻𝗲𝗰𝘁 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀: Prior limits in long-range reliability and bandwidth/latency overcome via end-to-end reliability (100 ns fault detection/switchover), redesigned optics/modules/chips (100× reliability, >200 m reach), multi-port aggregation, high-density packaging, peer-to-peer architecture, unified protocol—achieving terabytes-per-second links and 2.1 μs latency.
𝗨𝗻𝗶𝗳𝗶𝗲𝗱𝗕𝘂𝘀 (𝗨𝗕): New interconnect protocol enabling 10,000+ NPUs to operate as one machine with bus-grade interconnect, peer-to-peer coordination, all-resource pooling, a unified protocol, large-scale networking, and high availability; UnifiedBus 1.0 validated in Atlas 900 A3; UnifiedBus 2.0 specifications released openly to foster an ecosystem.
𝗔𝘁𝗹𝗮𝘀 𝟵𝟱𝟬 𝗦𝘂𝗽𝗲𝗿𝗖𝗹𝘂𝘀𝘁𝗲𝗿: 64× Atlas 950 SuperPoDs with >520,000 NPUs, delivering 524 EFLOPS FP8 across >10,000 cabinets; supports UnifiedBus over Ethernet (UBoE) and Remote Direct Memory Access over Converged Ethernet (RoCE); Huawei recommends UBoE for lower static latency, higher reliability, and reduced fabric cost; 2026-Q4 availability; claimed to surpass xAI’s Colossus.
𝗔𝘁𝗹𝗮𝘀 𝟵𝟲𝟬 𝗦𝘂𝗽𝗲𝗿𝗖𝗹𝘂𝘀𝘁𝗲𝗿: >1,000,000 NPUs, 2 ZFLOPS (zettaflops) FP8 and 4 ZFLOPS FP4; supports UBoE and RoCE with improved mean time between failures (MTBF); 2027-Q4 availability.

0:00

/1:19

Sources:

Huawei. (2025, September 18). Groundbreaking SuperPoD interconnect: Leading a new paradigm for AI infrastructure. Huawei News. https://www.huawei.com/en/news/2025/9/hc-xu-keynote-speech
Huawei. (2025, September 18). Huawei unveils world's most powerful SuperPoDs and SuperClusters. Huawei News. https://www.huawei.com/en/news/2025/9/hc-lingqu-ai-superpod
Huawei. (2025, September 18). Groundbreaking SuperPoD interconnect: Leading a new paradigm for AI infrastructure [Duplicate entry]. Huawei News. https://www.huawei.com/en/news/2025/9/hc-xu-keynote-speech
UnifiedBus Consortium. (2025). UnifiedBus official website. https://www.unifiedbus.com/en