NVIDIA Ethernet Networking Accelerates World’s Largest AI Supercomputer, Built by xAI

5 months ago 48
  • Last updated October 29, 2024
  • In AI News

NVIDIA Spectrum-X Makes Colossal NVIDIA Hopper 100,000-GPU System Possible

NVIDIA today announced that xAI’s Colossus supercomputer cluster comprising 100,000 NVIDIA Hopper GPUs, achieved this massive scale by using the NVIDIA Spectrum-X Ethernet networking platform, which is designed to deliver superior performance to multi-tenant, hyperscale AI factories using standards-based Ethernet, for its Remote Direct Memory Access network. 

Colossus, the world’s largest AI supercomputer, is being used to train xAI’s Grok family of large language models, with chatbots offered as a feature for X Premium subscribers. xAI is in the process of doubling the size of Colossus to a combined total of 200,000 NVIDIA Hopper GPUs.

“AI is becoming mission-critical and requires increased performance, security, scalability and cost-efficiency,” said Gilad Shainer, senior vice president of networking at NVIDIA. 

The NVIDIA Spectrum-X Ethernet networking platform is designed to provide innovators such as xAI with faster processing, analysis and execution of AI workloads, and in turn accelerates the development, deployment and time to market of AI solutions.

“Colossus is the most powerful training system in the world,” said Elon Musk. “Nice work by the xAI team, NVIDIA and our many partners/suppliers.”

This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days.

Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months.

Excellent…

— Elon Musk (@elonmusk) September 2, 2024

Some of the key highlights from the announcement include:

  • Spectrum X delivers up to 400 Gbps of bandwidth, significantly improving data transfer rates and reducing latency. This is crucial for organizations that rely on rapid data processing and real-time analytics.
  • The platform is optimized for AI applications, leveraging NVIDIA’s expertise in artificial intelligence to provide intelligent networking capabilities. This integration allows for smarter data routing and management, enhancing overall system performance.
  • It is built on NVIDIA’s Colossus architecture, which supports a wide range of applications from cloud computing to enterprise networking. This architecture is designed to scale efficiently, accommodating the increasing volume of data generated by modern applications.
  • With a focus on sustainability, Spectrum X aims to reduce energy consumption in data centers while maintaining high performance. This is increasingly important as organizations seek to minimize their carbon footprint.

The supporting facility and state-of-the-art supercomputer was built by xAI and NVIDIA in just 122 days, instead of the typical timeframe for systems of this size that can take many months to years. It took 19 days from the time the first rack rolled onto the floor until training began.

Picture of Tarunya S

Tarunya S

As a passionate enthusiast of caffeine and journalism, I transform tech into words. I enjoy mountain hikes as much as binge-watching new Netflix series.

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Rising 2025 | DE&I in Tech & AI Summit

Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

February 5 – 7, 2025 | Nimhans Convention Center, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

September 25-27, 2024 | 📍Bangalore, India

25 July 2025 | 583 Park Avenue, New York

discord icon

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

World's Biggest Media & Analyst firm specializing in AI

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

© Analytics India Magazine Pvt Ltd & AIM Media House LLC 2024

Read Entire Article