New paradigms for AI infrastructure

I am proud to share that at Huawei Connect 2025 in Shanghai, we launched a new SuperPoD generation, raising the bar for AI infrastructure. Recognising the changes catalysed by DeepSeekโR1, we enhanced our AI accelerators for higher efficiency and built upon extensive customer feedback to meet real-world needs.
Please take a look at the more below.
- ๐๐ต๐ถ๐ฝ ๐ฟ๐ผ๐ฎ๐ฑ๐บ๐ฎ๐ฝ: Three new Ascend series over the next three yearsโAscend 950, 960, 970โevolving usability, data formats, interconnect, and bandwidth on a near annual cadence, targeting compute doubling each generation.
- ๐๐๐ฐ๐ฒ๐ป๐ฑ ๐ต๐ฑ๐ฌ ๐๐ฒ๐ฟ๐ถ๐ฒ๐: Two variants on a common dieโAscend 950PR (prefill and recommendation) and Ascend 950DT (decode and training); added low-precision formats FP8, MXFP8, MXFP4, and HiF8 (Huawei format with FP16-like precision at FP8-like efficiency); vector advances via combined single instruction multiple data (SIMD) and single instruction multiple threads (SIMT), finer 128-byte memory access, and 2 TB/s interconnect.
- ๐๐๐ฐ๐ฒ๐ป๐ฑ ๐ต๐ฑ๐ฌ๐ฃ๐ฅ ๐ฎ๐๐ฎ๐ถ๐น๐ฎ๐ฏ๐ถ๐น๐ถ๐๐: 2026-Q1 in card and SuperPoD server form factors; paired with HiBL 1.0 high bandwidth memory (HBM) for cost-effective, compute-intensive prefill and recommendation.
- ๐๐๐ฐ๐ฒ๐ป๐ฑ ๐ต๐ฑ๐ฌ๐๐ง ๐ฎ๐๐ฎ๐ถ๐น๐ฎ๐ฏ๐ถ๐น๐ถ๐๐: 2026-Q4; paired with HiZQ 2.0 HBM delivering 144 GB and 4 TB/s memory bandwidth; supports FP8, MXFP8, MXFP4, and HiF8; 2 TB/s interconnect.
- ๐๐๐ฐ๐ฒ๐ป๐ฑ ๐ต๐ฒ๐ฌ ๐ฎ๐ป๐ฑ ๐ต๐ณ๐ฌ: Ascend 960 (2027-Q4) doubles compute, memory bandwidth/capacity, and interconnect ports vs 950; adds HiF4 (Huawei 4-bit) for higher-precision FP4-class inference. Ascend 970 (2028-Q4) targets 2ร FP4/FP8 compute vs 960, โฅ1.5ร memory bandwidth, and 4 TB/s interconnect.
- ๐๐๐ฟ๐ฟ๐ฒ๐ป๐ ๐ฆ๐๐ฝ๐ฒ๐ฟ๐ฃ๐ผ๐ ๐ฏ๐ฎ๐๐ฒ๐น๐ถ๐ป๐ฒ: Atlas 900 A3 SuperPoD (up to 384 Ascend 910C) delivers up to 300 PFLOPS (petaflops) and has >300 deployments across >20 customers; underpinning the CloudMatrix384 cloud instance.
- ๐๐๐น๐ฎ๐ ๐ต๐ฑ๐ฌ ๐ฆ๐๐ฝ๐ฒ๐ฟ๐ฃ๐ผ๐: Up to 8,192 Ascend 950DT NPUs (neural processing units), 160 cabinets over 1,000 m2, allโoptical interconnect; 8 EFLOPS (exaflops) FP8 and 16 EFLOPS FP4 with 16 PB/s (petabytes per second) interconnect; 2026-Q4 availability; training throughput 4.91 million tokens/s and inference 19.6 million tokens/s.
- ๐๐๐น๐ฎ๐ ๐ต๐ฒ๐ฌ ๐ฆ๐๐ฝ๐ฒ๐ฟ๐ฃ๐ผ๐: Up to 15,488 Ascend 960 chips, 220 cabinets over 2,200 m2; 30 EFLOPS FP8 and 60 EFLOPS FP4, 4,460 TB memory, and 34 PB/s interconnect; training 15.9 million tokens/s and inference 80.5 million tokens/s; 2027-Q4 availability.
- ๐๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐น-๐ฝ๐๐ฟ๐ฝ๐ผ๐๐ฒ ๐ฐ๐ผ๐บ๐ฝ๐๐๐ถ๐ป๐ด: Kunpeng 950 processor (2026-Q1) in 96c/192t and 192c/384t variants with four-layer confidential computing; further Kunpeng models in 2028-Q1: a high-performance 96c/192t SKU (>50% per-core uplift) and a high-density โฅ256c/512t SKU.
- ๐ง๐ฎ๐ถ๐ฆ๐ต๐ฎ๐ป ๐ต๐ฑ๐ฌ ๐ฆ๐๐ฝ๐ฒ๐ฟ๐ฃ๐ผ๐: Worldโs first general-purpose computing SuperPoD (up to 16 nodes, 32 processors, 48 TB memory; memory/SSD/data processing unit (DPU) pooling); with GaussDB multi-write, delivers 2.9ร performance without application changes; alternatives to mainframes, mid-range systems, and Oracle Exadata; boosts memory utilisation by 20% (virtualisation) and Spark real-time processing by 30%; 2026-Q1 availability.
- ๐๐๐ฏ๐ฟ๐ถ๐ฑ ๐ฆ๐๐ฝ๐ฒ๐ฟ๐ฃ๐ผ๐ ๐ณ๐ผ๐ฟ ๐ฟ๐ฒ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐ฑ๐ฎ๐๐ถ๐ผ๐ป: Combining TaiShan 950 and Atlas 950 SuperPoDs to form an ultra-large shared memory pool for petabyte-scale embeddings and ultra-low-latency inference for generative recommendation systems.
- ๐๐ป๐๐ฒ๐ฟ๐ฐ๐ผ๐ป๐ป๐ฒ๐ฐ๐ ๐ฐ๐ต๐ฎ๐น๐น๐ฒ๐ป๐ด๐ฒ๐: Prior limits in long-range reliability and bandwidth/latency overcome via end-to-end reliability (100 ns fault detection/switchover), redesigned optics/modules/chips (100ร reliability, >200 m reach), multi-port aggregation, high-density packaging, peer-to-peer architecture, unified protocolโachieving terabytes-per-second links and 2.1 ฮผs latency.
- ๐จ๐ป๐ถ๐ณ๐ถ๐ฒ๐ฑ๐๐๐ (๐จ๐): New interconnect protocol enabling 10,000+ NPUs to operate as one machine with bus-grade interconnect, peer-to-peer coordination, all-resource pooling, a unified protocol, large-scale networking, and high availability; UnifiedBus 1.0 validated in Atlas 900 A3; UnifiedBus 2.0 specifications released openly to foster an ecosystem.
- ๐๐๐น๐ฎ๐ ๐ต๐ฑ๐ฌ ๐ฆ๐๐ฝ๐ฒ๐ฟ๐๐น๐๐๐๐ฒ๐ฟ: 64ร Atlas 950 SuperPoDs with >520,000 NPUs, delivering 524 EFLOPS FP8 across >10,000 cabinets; supports UnifiedBus over Ethernet (UBoE) and Remote Direct Memory Access over Converged Ethernet (RoCE); Huawei recommends UBoE for lower static latency, higher reliability, and reduced fabric cost; 2026-Q4 availability; claimed to surpass xAIโs Colossus.
- ๐๐๐น๐ฎ๐ ๐ต๐ฒ๐ฌ ๐ฆ๐๐ฝ๐ฒ๐ฟ๐๐น๐๐๐๐ฒ๐ฟ: >1,000,000 NPUs, 2 ZFLOPS (zettaflops) FP8 and 4 ZFLOPS FP4; supports UBoE and RoCE with improved mean time between failures (MTBF); 2027-Q4 availability.
0:00
/1:19
Sources:
- Huawei. (2025, September 18). Groundbreaking SuperPoD interconnect: Leading a new paradigm for AI infrastructure. Huawei News. https://www.huawei.com/en/news/2025/9/hc-xu-keynote-speech
- Huawei. (2025, September 18). Huawei unveils world's most powerful SuperPoDs and SuperClusters. Huawei News. https://www.huawei.com/en/news/2025/9/hc-lingqu-ai-superpod
- Huawei. (2025, September 18). Groundbreaking SuperPoD interconnect: Leading a new paradigm for AI infrastructure [Duplicate entry]. Huawei News. https://www.huawei.com/en/news/2025/9/hc-xu-keynote-speech
- UnifiedBus Consortium. (2025). UnifiedBus official website. https://www.unifiedbus.com/en