New paradigms for AI infrastructure

New paradigms for AI infrastructure

I am proud to share that at Huawei Connect 2025 in Shanghai, we launched a new SuperPoD generation, raising the bar for AI infrastructure. Recognising the changes catalysed by DeepSeekโ€‘R1, we enhanced our AI accelerators for higher efficiency and built upon extensive customer feedback to meet real-world needs.

Please take a look at the more below.

  • ๐—–๐—ต๐—ถ๐—ฝ ๐—ฟ๐—ผ๐—ฎ๐—ฑ๐—บ๐—ฎ๐—ฝ: Three new Ascend series over the next three yearsโ€”Ascend 950, 960, 970โ€”evolving usability, data formats, interconnect, and bandwidth on a near annual cadence, targeting compute doubling each generation.
  • ๐—”๐˜€๐—ฐ๐—ฒ๐—ป๐—ฑ ๐Ÿต๐Ÿฑ๐Ÿฌ ๐˜€๐—ฒ๐—ฟ๐—ถ๐—ฒ๐˜€: Two variants on a common dieโ€”Ascend 950PR (prefill and recommendation) and Ascend 950DT (decode and training); added low-precision formats FP8, MXFP8, MXFP4, and HiF8 (Huawei format with FP16-like precision at FP8-like efficiency); vector advances via combined single instruction multiple data (SIMD) and single instruction multiple threads (SIMT), finer 128-byte memory access, and 2 TB/s interconnect.
  • ๐—”๐˜€๐—ฐ๐—ฒ๐—ป๐—ฑ ๐Ÿต๐Ÿฑ๐Ÿฌ๐—ฃ๐—ฅ ๐—ฎ๐˜ƒ๐—ฎ๐—ถ๐—น๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜†: 2026-Q1 in card and SuperPoD server form factors; paired with HiBL 1.0 high bandwidth memory (HBM) for cost-effective, compute-intensive prefill and recommendation.
  • ๐—”๐˜€๐—ฐ๐—ฒ๐—ป๐—ฑ ๐Ÿต๐Ÿฑ๐Ÿฌ๐——๐—ง ๐—ฎ๐˜ƒ๐—ฎ๐—ถ๐—น๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜†: 2026-Q4; paired with HiZQ 2.0 HBM delivering 144 GB and 4 TB/s memory bandwidth; supports FP8, MXFP8, MXFP4, and HiF8; 2 TB/s interconnect.
  • ๐—”๐˜€๐—ฐ๐—ฒ๐—ป๐—ฑ ๐Ÿต๐Ÿฒ๐Ÿฌ ๐—ฎ๐—ป๐—ฑ ๐Ÿต๐Ÿณ๐Ÿฌ: Ascend 960 (2027-Q4) doubles compute, memory bandwidth/capacity, and interconnect ports vs 950; adds HiF4 (Huawei 4-bit) for higher-precision FP4-class inference. Ascend 970 (2028-Q4) targets 2ร— FP4/FP8 compute vs 960, โ‰ฅ1.5ร— memory bandwidth, and 4 TB/s interconnect.
  • ๐—–๐˜‚๐—ฟ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐—ฃ๐—ผ๐—— ๐—ฏ๐—ฎ๐˜€๐—ฒ๐—น๐—ถ๐—ป๐—ฒ: Atlas 900 A3 SuperPoD (up to 384 Ascend 910C) delivers up to 300 PFLOPS (petaflops) and has >300 deployments across >20 customers; underpinning the CloudMatrix384 cloud instance.
  • ๐—”๐˜๐—น๐—ฎ๐˜€ ๐Ÿต๐Ÿฑ๐Ÿฌ ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐—ฃ๐—ผ๐——: Up to 8,192 Ascend 950DT NPUs (neural processing units), 160 cabinets over 1,000 m2, allโ€‘optical interconnect; 8 EFLOPS (exaflops) FP8 and 16 EFLOPS FP4 with 16 PB/s (petabytes per second) interconnect; 2026-Q4 availability; training throughput 4.91 million tokens/s and inference 19.6 million tokens/s.
  • ๐—”๐˜๐—น๐—ฎ๐˜€ ๐Ÿต๐Ÿฒ๐Ÿฌ ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐—ฃ๐—ผ๐——: Up to 15,488 Ascend 960 chips, 220 cabinets over 2,200 m2; 30 EFLOPS FP8 and 60 EFLOPS FP4, 4,460 TB memory, and 34 PB/s interconnect; training 15.9 million tokens/s and inference 80.5 million tokens/s; 2027-Q4 availability.
  • ๐—š๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐—น-๐—ฝ๐˜‚๐—ฟ๐—ฝ๐—ผ๐˜€๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ถ๐—ป๐—ด: Kunpeng 950 processor (2026-Q1) in 96c/192t and 192c/384t variants with four-layer confidential computing; further Kunpeng models in 2028-Q1: a high-performance 96c/192t SKU (>50% per-core uplift) and a high-density โ‰ฅ256c/512t SKU.
  • ๐—ง๐—ฎ๐—ถ๐—ฆ๐—ต๐—ฎ๐—ป ๐Ÿต๐Ÿฑ๐Ÿฌ ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐—ฃ๐—ผ๐——: Worldโ€™s first general-purpose computing SuperPoD (up to 16 nodes, 32 processors, 48 TB memory; memory/SSD/data processing unit (DPU) pooling); with GaussDB multi-write, delivers 2.9ร— performance without application changes; alternatives to mainframes, mid-range systems, and Oracle Exadata; boosts memory utilisation by 20% (virtualisation) and Spark real-time processing by 30%; 2026-Q1 availability.
  • ๐—›๐˜†๐—ฏ๐—ฟ๐—ถ๐—ฑ ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐—ฃ๐—ผ๐—— ๐—ณ๐—ผ๐—ฟ ๐—ฟ๐—ฒ๐—ฐ๐—ผ๐—บ๐—บ๐—ฒ๐—ป๐—ฑ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Combining TaiShan 950 and Atlas 950 SuperPoDs to form an ultra-large shared memory pool for petabyte-scale embeddings and ultra-low-latency inference for generative recommendation systems.
  • ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐—ฐ๐—ผ๐—ป๐—ป๐—ฒ๐—ฐ๐˜ ๐—ฐ๐—ต๐—ฎ๐—น๐—น๐—ฒ๐—ป๐—ด๐—ฒ๐˜€: Prior limits in long-range reliability and bandwidth/latency overcome via end-to-end reliability (100 ns fault detection/switchover), redesigned optics/modules/chips (100ร— reliability, >200 m reach), multi-port aggregation, high-density packaging, peer-to-peer architecture, unified protocolโ€”achieving terabytes-per-second links and 2.1 ฮผs latency.
  • ๐—จ๐—ป๐—ถ๐—ณ๐—ถ๐—ฒ๐—ฑ๐—•๐˜‚๐˜€ (๐—จ๐—•): New interconnect protocol enabling 10,000+ NPUs to operate as one machine with bus-grade interconnect, peer-to-peer coordination, all-resource pooling, a unified protocol, large-scale networking, and high availability; UnifiedBus 1.0 validated in Atlas 900 A3; UnifiedBus 2.0 specifications released openly to foster an ecosystem.
  • ๐—”๐˜๐—น๐—ฎ๐˜€ ๐Ÿต๐Ÿฑ๐Ÿฌ ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐—–๐—น๐˜‚๐˜€๐˜๐—ฒ๐—ฟ: 64ร— Atlas 950 SuperPoDs with >520,000 NPUs, delivering 524 EFLOPS FP8 across >10,000 cabinets; supports UnifiedBus over Ethernet (UBoE) and Remote Direct Memory Access over Converged Ethernet (RoCE); Huawei recommends UBoE for lower static latency, higher reliability, and reduced fabric cost; 2026-Q4 availability; claimed to surpass xAIโ€™s Colossus.
  • ๐—”๐˜๐—น๐—ฎ๐˜€ ๐Ÿต๐Ÿฒ๐Ÿฌ ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐—–๐—น๐˜‚๐˜€๐˜๐—ฒ๐—ฟ: >1,000,000 NPUs, 2 ZFLOPS (zettaflops) FP8 and 4 ZFLOPS FP4; supports UBoE and RoCE with improved mean time between failures (MTBF); 2027-Q4 availability.
0:00
/1:19

Sources: