How Uber is Running Ray on Kubernetes

1 week ago 10

If you’re running distributed ML and want to avoid the infra mess, Uber’s Ray-on-Kubernetes stack is a blueprint worth studying.

Illustration by Mohit Pandey

Companies are in a bind over whether to use or ditch Kubernetes. While some companies decided to completely move away from it, many have now been moving their workloads back to Kubernetes after trying and testing monolithic architectures. Unfortunately, both approaches have their pain points, and no single perfect solution exists.

Last year, ride-hailing logistics firm Uber decided to upgrade its machine learning platform and shift its ML workloads to Kubernetes. And, in typical Uber fashion, it didn’t just migrate but also built some of its own tools along the way to make everything run smoothly.

In a recent blog, the Uber tech team explained this transition and the motivation behind it. ML pipelines deal with huge volumes of data, especially during model training. These are batch processing jobs that get broken down into big distributed tasks, all connected in a flow.

Until mid-2023, they used an internal job gateway called MADLJ to run Spark and Ray-based jobs. While this setup did the job, it came with a bunch of headaches. ML engineers had to micromanage job placement: pick clusters, regions, exact GPU SKUs. One wrong move meant long queues, idle GPUs, or worse—stalled experiments.

Part of the issue was MADLJ’s dependency on Peloton, which ran on Apache Mesos. Mesos has fallen out of favour, so Uber decided it was time to switch to Kubernetes, which is now the industry standard.

Tools like Spark and Ray already support Kubernetes, making the decision pretty straightforward. But Uber didn’t throw everything away. It adapted some of the custom Peloton features (like resource pools and elastic sharing) to work with Kubernetes.

Commenting on Uber’s blog, Robert Nishihara, co-founder of Anyscale, which created Ray, explained how Ray and Kubernetes work together. “Each one on their own misses part of the picture. Together, they form a software stack for AI that addresses both sets of needs,” he said.

What Uber Wanted to Build

To fix this mess, Uber built a unified orchestration layer for ML jobs. Now, engineers simply define the job type (e.g., Spark or Ray) and resource needs (CPU/GPU, memory), and the system handles the rest. A smart job scheduler routes workloads across multiple Kubernetes clusters based on real-time resource availability, locality, and cost.

The core of this setup is federated resource management, which makes Uber’s compute clusters feel like a single resource pool.

The first is the user application layer where the ML pipelines live. They interact with APIs and submit job requests in a nice, clean, declarative format. Then comes the global control plane, which is the brain of the operation. It runs on Kubernetes and has a custom API server and controllers to handle jobs.

Lastly, there are local control planes, which are individual Kubernetes clusters that actually run the jobs.

In the global control plane, Uber introduced custom Kubernetes resources to represent jobs. It also built a job controller that watches these job requests and figures out where to run them. Once it finds a suitable cluster, it launches the job, monitors it until it finishes, and then cleans everything up.

It automatically handles secrets, lifecycle, failure recovery, and team-specific routing through Uber’s internal ownership system (uOwn). This not only improves developer experience but also boosts infra efficiency at scale.

What makes this especially powerful is that it’s not just Ray-specific—it’s an abstraction layer that can work for any job type with declarative resource specs. So, whether you’re experimenting with small training jobs or launching massive distributed runs across GPUs, the platform handles the orchestration and scaling transparently.

If you’re running distributed ML and want to avoid the infra mess, Uber’s Ray-on-Kubernetes stack is a blueprint worth studying.

When it comes to auto companies, a lot of them have been using Kubernetes for managing and developing their software. This includes the likes of Tesla, Ford, Mercedes-Benz, Volkswagen, DENSO, and self-driving companies like Waymo, Aurora, and Zoox. But that’s a story for another time.

Ray is Trusted by Many

Ray by Anyscale is the true champion here. Trusted by AI leaders like OpenAI, AWS, Cohere, Canva, Airbnb and Spotify, Ray is an open-source compute engine designed to simplify distributed computing for AI and Python applications. It allows developers to scale workloads effortlessly—no deep knowledge of distributed systems required.

“At OpenAI, Ray allows us to iterate at scale much faster than we could before. We use Ray to train our largest models, including ChatGPT,” Greg Brockman, co-founder of OpenAI, said in a blog post.

As AI models grow in size and complexity, developers need to move beyond single-machine setups to multi-node, GPU-accelerated environments. Ray bridges this gap with a unified framework that abstracts the complexity of distributed computing.

Sure, it comes with its problems, but Uber manages to handle them with Kubernetes.

Mohit Pandey

Mohit writes about AI in simple, explainable, and sometimes funny words. He holds keen interest in discussing AI with people building it for India, and for Bharat, while also talking a little bit about AGI.

Our Upcoming Conference

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed