If you’re running distributed ML and want to avoid the infra mess, Uber’s Ray-on-Kubernetes stack is a blueprint worth studying.

Illustration by Mohit Pandey
Companies are in a bind over whether to use or ditch Kubernetes. While some companies decided to completely move away from it, many have now been moving their workloads back to Kubernetes after trying and testing monolithic architectures. Unfortunately, both approaches have their pain points, and no single perfect solution exists.
Last year, ride-hailing logistics firm Uber decided to upgrade its machine learning platform and shift its ML workloads to Kubernetes. And, in typical Uber fashion, it didn’t just migrate but also built some of its own tools along the way to make everything run smoothly.
In a recent blog, the Uber tech team explained this transition and the motivation behind it. ML pipelines deal with huge volumes of data, especially during model training. These are batch processing jobs that get broken down into big distributed tasks, all connected in a flow.
Until mid-2023, they used an internal job gateway called MADLJ to run Spark and Ray-based jobs. While this setup did the job, it came with a bunch of headaches. ML engineers had to micromanage job placement: pick clusters, regions, exact GPU SKUs. One wrong move meant long queues, idle GPUs, or worse—stalled experiments.
Part of the issue was MADLJ’s dependency on Peloton, which ran on Apache Mesos. Mesos has fallen out of favour, so Uber decided it was time to switch to Kubernetes, which is now the industry standard.
Tools like Spark and Ray already support Kubernetes, making the decision pretty straightforward. But Uber didn’t throw everything away. It adapted some of the custom Peloton features (like resource pools and elastic sharing) to work with Kubernetes.
Commenting on Uber’s blog, Robert Nishihara, co-founder of Anyscale, which created Ray, explained how Ray and Kubernetes work together. “Each one on their own misses part of the picture. Together, they form a software stack for AI that addresses both sets of needs,” he said.
What Uber Wanted to Build
To fix this mess, Uber built a unified orchestration layer for ML jobs. Now, engineers simply define the job type (e.g., Spark or Ray) and resource needs (CPU/GPU, memory), and the system handles the rest. A smart job scheduler routes workloads across multiple Kubernetes clusters based on real-time resource availability, locality, and cost.
The core of this setup is federated resource management, which makes Uber’s compute clusters feel like a single resource pool.
The first is the user application layer where the ML pipelines live. They interact with APIs and submit job requests in a nice, clean, declarative format. Then comes the global control plane, which is the brain of the operation. It runs on Kubernetes and has a custom API server and controllers to handle jobs.
Lastly, there are local control planes, which are individual Kubernetes clusters that actually run the jobs.
In the global control plane, Uber introduced custom Kubernetes resources to represent jobs. It also built a job controller that watches these job requests and figures out where to run them. Once it finds a suitable cluster, it launches the job, monitors it until it finishes, and then cleans everything up.
It automatically handles secrets, lifecycle, failure recovery, and team-specific routing through Uber’s internal ownership system (uOwn). This not only improves developer experience but also boosts infra efficiency at scale.
What makes this especially powerful is that it’s not just Ray-specific—it’s an abstraction layer that can work for any job type with declarative resource specs. So, whether you’re experimenting with small training jobs or launching massive distributed runs across GPUs, the platform handles the orchestration and scaling transparently.
If you’re running distributed ML and want to avoid the infra mess, Uber’s Ray-on-Kubernetes stack is a blueprint worth studying.
When it comes to auto companies, a lot of them have been using Kubernetes for managing and developing their software. This includes the likes of Tesla, Ford, Mercedes-Benz, Volkswagen, DENSO, and self-driving companies like Waymo, Aurora, and Zoox. But that’s a story for another time.
Ray is Trusted by Many
Ray by Anyscale is the true champion here. Trusted by AI leaders like OpenAI, AWS, Cohere, Canva, Airbnb and Spotify, Ray is an open-source compute engine designed to simplify distributed computing for AI and Python applications. It allows developers to scale workloads effortlessly—no deep knowledge of distributed systems required.
“At OpenAI, Ray allows us to iterate at scale much faster than we could before. We use Ray to train our largest models, including ChatGPT,” Greg Brockman, co-founder of OpenAI, said in a blog post.
As AI models grow in size and complexity, developers need to move beyond single-machine setups to multi-node, GPU-accelerated environments. Ray bridges this gap with a unified framework that abstracts the complexity of distributed computing.
Sure, it comes with its problems, but Uber manages to handle them with Kubernetes.
Mohit Pandey
Mohit writes about AI in simple, explainable, and sometimes funny words. He holds keen interest in discussing AI with people building it for India, and for Bharat, while also talking a little bit about AGI.
Related Posts
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Subscribe to The Belamy: Our Weekly Newsletter
Biggest AI stories, delivered to your inbox every week.
Happy Llama 2025
AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru, India
Data Engineering Summit 2025
May 15 - 16, 2025 | 📍 Hotel Radisson Blu, Bengaluru
MachineCon GCC Summit 2025
June 20 to 22, 2025 | 📍 ITC Grand, Goa
Cypher India 2025
Sep 17 to 19, 2025 | 📍KTPO, Whitefield, Bengaluru, India
MLDS 2026
India's Biggest Developers Summit | 📍Nimhans Convention Center, Bengaluru
Rising 2026
India's Biggest Summit on Women in Tech & AI 📍 Bengaluru