AMD is gradually finding its footing in the AI data centre market, emerging as a strong competitor to NVIDIA with its latest AI accelerators and expanding partnerships. At the Advancing 2024 event, the company introduced its new MI325X accelerators for training and inferencing LLMs.
The AMD Instinct MI325X accelerators offer leading memory capacity and bandwidth, featuring 256GB of HBM3E with 6.0TB/s throughput—1.8x more capacity and 1.3x higher bandwidth than the NVIDIA H200. They also deliver 1.3x higher peak theoretical FP16 and FP8 compute performance than the H200.
When asked how AMD compares itself to NVIDIA, Andrew Dieckmann, CVP & GM of data centre GPU at AMD, told AIM that they benchmark against NVIDIA’s highest-performing solution.
“We are trying to take very representative benchmarks that are realistic. I can tell you that in our customer engagements, especially regarding inference workloads, we have yet to find a single workload that we cannot outperform NVIDIA on,” he said, adding that AMD doesn’t always outperform NVIDIA.
“However, if we optimise for a specific solution, we can beat them.”
The company also previewed the upcoming AMD Instinct MI350 series accelerators, scheduled for release in 2025, promising a 35x improvement in inference performance over current models and featuring up to 288GB of HBM3E memory. Furthermore, the company plans to launch the MI400 in 2026.
Interestingly, on the same day, NVIDIA made headlines by delivering its much-anticipated Blackwell GPUs to OpenAI and Microsoft.
Microsoft announced that Azure is the first cloud platform to run NVIDIA’s Blackwell system with GB200-powered AI servers. “Our long-standing partnership with NVIDIA and deep innovation continues to lead the industry, powering the most sophisticated AI workloads,” said Microsoft CEO Satya Nadella.
According to a recent report, NVIDIA’s Blackwell GPUs are sold out for the next 12 months, reflecting a similar supply situation that occurred with Hopper GPUs several quarters ago.
Consequently, NVIDIA is anticipated to gain market share next year. In the latest quarter, AMD reported revenue of $5.8 billion, while NVIDIA continues to dominate the AI chip market with an impressive quarterly revenue of $30 billion.
AMD Fills the Gap
With NVIDIA’s GPUs sold out for the next year, AMD has an ideal opportunity to meet the demand from customers seeking access to compute resources for training and running LLMs.
AMD CEO Lisa Su expects the data centre AI accelerator’s total addressable market (TAM) to grow by more than 60% annually, reaching $500 billion by 2028. “For AMD, this represents a significant growth opportunity,” she said.
According to her, AMD GPUs are well-suited for running open-source models like Meta’s Llama 3.1 and Stable Diffusion, outperforming NVIDIA’s H200.
“When you look at that across some of the key models, we’re delivering 20 to 40% better inference performance and latency on models like Llama and Mixtral,” said Su in her keynote address.
“The MI325 platform delivers up to 40% more inference performance than the H200 on Llama 3.1. Many customers are also focused on training, and we’ve made significant progress in optimising our software stack for training,” she added.
Moreover, to challenge NVIDIA’s CUDA, AMD launched ROCm 6.2, which introduces support for essential AI features such as the FP8 datatype, Flash Attention 3, Kernel Fusion, and more. These updates enable ROCm 6.2 to deliver up to a 2.4X performance boost in inference and a 1.8X improvement in training across a range of LLMs compared to ROCm 6.0.
“ROCm is a complete set of libraries, runtime compilers, and tools needed to develop and deploy AI workloads. We designed ROCm to be modular and open-source, allowing for rapid contributions from AI communities,” said Vamsi Bopanna, SVP of AI at AMD, adding that it is also designed to connect easily with ecosystem components and frameworks like PyTorch and model hubs like Hugging Face.
He explained that they have expanded support for newer frameworks like Jax and implemented powerful new features, algorithms, and optimisations to deliver the best performance for generative AI workloads.
AMD also supports various open-source frameworks, including vLLM, Triton, SGlang, and ONXX Runtime and more. Bopanna revealed that today, over 1 million Hugging Face models run on AMD.
The company recently acquired European’s private AI lab Silo AI. “We recently completed the acquisition of Silo AI, which adds a world class team with tremendous experience training and optimising LLMS and also delivering customer specific AI solutions,” said Su.
At the event, AMD showcased testimonials for ROCm by inviting startup leaders, including Amit Jain, the CEO of Luma AI; Ashish Vashwani, the CEO of Essential AI; Dani Yogatama, the CEO of Reka AI, and Dmytro Dzhulgakov, the CTO of Fireworks AI.
Luma AI recently launched a video generation model called Dream Machine. “The models we’re training are very challenging and don’t resemble LLMs at all. However, we’ve been impressed with how quickly we were able to get the model running on ROCm and MI300X GPUs. It took us just a few days to establish the end-to-end pipeline, which is quite fantastic,” said Jain.
More Customers
AMD is partnering with customers including Meta, Microsoft, xAI, Oracle, and Cohere, among others.
Su highlighted Oracle as a key customer for AMD’s latest GPUs. “They’ve integrated AMD across their entire infrastructure, using our CPUs, GPUs, and DPUs,” she said. Oracle SVP Karan Batta joined Su on stage to discuss how Oracle’s customers are utilising AMD’s hardware tech stack.
“Our largest cloud-native customer is Uber. They use Oracle Cloud Infrastructure (OCI) Compute E5 instances with 4th generation AMD EPYC processors to achieve significant performance efficiency. Almost all of their trip-serving infrastructure now runs on AMD within OCI compute,” said Batta.
“We also have Red Bull Powertrains developing the next generation of F1 engines for upcoming seasons. Additionally, our database franchise is now powered by AMD CPUs. Customers like PayPal and Banco do Brasil are using Exadata powered by AMD to enhance their database portfolios,” he added.
Alongside Oracle, Databricks is another major customer of AMD. “The large memory capacity and incredible compute capabilities of MI300X have been key to achieving over a 50% increase in performance on some of our critical workloads,” said Naveen Rao, VP of generative AI at Databricks, adding that this includes models like Llama and other proprietary models.
Microsoft, the first cloud provider to receive NVIDIA Blackwell GPUs, is also partnering with AMD to obtain the new MI325 accelerators. “We’re very excited to see how the teams are coming together. OpenAI, Microsoft, and AMD are all working to accelerate the benefits so that this technology can diffuse even faster. We look forward to the roadmap for the MI350 and the next generation after that,” said Nadella.