Huawei is advancing work on a new AI processor and a large-scale cloud system, aiming to challenge Nvidia’s dominance in AI infrastructure. The company has achieved a technical milestone despite significant restrictions imposed by US sanctions.
Huawei Technologies is currently testing a new AI processor, the Ascend 910D, which is intended to replace Nvidia’s more powerful products in the future. According to the Wall Street Journal, Huawei expects to receive the first samples of the 910D in May 2025. The chip is designed to outperform Nvidia’s H100, which has been the industry standard for AI training since 2022 and is now being replaced in Western markets by successors from the Blackwell generation. The most powerful Nvidia chips are no longer permitted for sale in China due to export restrictions.
However, compared with Nvidia’s H100, the Ascend 910D is less energy efficient and has significantly higher power consumption. Huawei is employing new packaging technologies to connect multiple silicon dies and increase performance, the Wall Street Journal reports. Development remains at an early stage, and comprehensive testing will determine when the chip is ready for the market.
CloudMatrix 384: Huawei’s response to Nvidia’s rack-scale systems
In parallel, Huawei has introduced a new rack-scale system called CloudMatrix 384, which is still based on the earlier Ascend 910C chip. The system interconnects 384 of these chips. According to SemiAnalysis, CloudMatrix 384 achieves approximately 300 PFLOPs of BF16 compute performance—nearly double that of Nvidia’s GB200 NVL72 system. Nvidia recently introduced its successor, the GB300 NVL72.
Ad
THE DECODER Newsletter
The most important AI news straight to your inbox.
✓ Weekly
✓ Free
✓ Cancel at any time
Huawei’s system also leads in memory, with 3.6 times greater aggregate capacity and 2.1 times the memory bandwidth compared to Nvidia’s offering. However, CloudMatrix 384 requires about 4.1 times more energy than Nvidia’s comparable system. SemiAnalysis notes that the system’s energy efficiency per FLOP is also 2.5 times lower.
A central feature of the CloudMatrix 384 is its fully optical interconnect: Huawei has eliminated copper cables entirely, instead using 6,912 400G transceivers, according to SemiAnalysis. Each of the 384 GPUs uses seven transceivers for internal scaling. SemiAnalysis notes that the architecture resembles concepts Nvidia previously abandoned due to cost. Analysts consider Huawei to be “one generation” ahead of Nvidia and AMD in this respect.
Ongoing reliance on foreign suppliers despite sanctions
Despite sweeping US sanctions, Huawei remains dependent on foreign suppliers for chip manufacturing. SemiAnalysis reports that the previously shipped Ascend 910B and 910C chips were produced by TSMC in Taiwan. Huawei is said to have arranged for these chips through intermediary firms such as Sophgo. TSMC could face a penalty of up to $1 billion as a result.
Huawei also relied on foreign sources for access to high bandwidth memory (HBM). According to SemiAnalysis, the company acquired large volumes of HBM stacks from Samsung, accumulating around 13 million units in storage. Despite export controls, HBM reached China via intermediaries such as Faraday and CoAsia.
China’s largest chip manufacturer, SMIC, has expanded its 7-nanometer production capacity, according to SemiAnalysis, but continues to lag behind leading manufacturers in both yield and technology. These capacities could potentially increase in the medium term, provided export controls do not become stricter.
Recommendation
Prioritizing system-level optimization over single-chip performance
Given these structural limitations, Huawei is focusing on system-level optimization rather than maximizing individual chip performance. The company is building large, interconnected systems to achieve scale. SemiAnalysis observes that CloudMatrix 384 leverages China’s nearly unlimited power supply to provide competitive AI infrastructure, despite high energy consumption.
According to SemiAnalysis, the system delivers 70 percent more FLOPS than Nvidia’s current rack, even though its energy efficiency is substantially lower. In China, the additional energy demand is considered acceptable in light of the political priority attached to technological independence.
Recently, Huawei demonstrated its continued ability to deploy high-performance chips despite US sanctions with the Mate 60 smartphone, which used a processor manufactured in China.