It’s no secret anymore that AI is GPU hungry—and OpenAI’s Sam Altman keeps stressing just how urgently they need more. “Working as fast as we can to really get stuff humming; if anyone has GPU capacity in 100k chunks we can get ASAP, please call,” he posted on X recently. The demand surged even further when users flooded ChatGPT with Ghibli-style image requests, prompting Altman to ask people to slow down.
This is where Google holds a distinct advantage. Unlike OpenAI, it isn’t fully dependent on third-party hardware providers. At Google Cloud Next 2025, the company unveiled Ironwood, its seventh-generation tensor processing unit (TPU), designed specifically for inference. It’s a key part of Google’s broader AI Hypercomputer architecture.
“Ironwood is our most powerful, capable and energy-efficient TPU yet. And it’s purpose-built to power thinking, inferential AI models at scale,” Google said. The tech giant said that today, we live in the “age of inference”, where AI agents actively search, interpret, and generate insights instead of just responding with raw data.
The company further said that Ironwood is built to manage the complex computation and communication demands of thinking models, such as large language models and mixture-of-experts systems. It added that with Ironwood, customers no longer have to choose between compute scale and performance.
Ironwood will be available to Google Cloud customers later this year, the tech giant said.It currently supports advanced models, including Gemini 2.5 Pro and AlphaFold. The company also recently announced that the Deep Research feature in the Gemini app is now powered by Gemini 2.5 Pro.
Google stated that over 60% of funded generative AI startups and nearly 90% of generative AI unicorns (startups valued at $1 billion or more) are Google Cloud customers. In 2024, Apple revealed it used 8,192 TPU v4 chips in Google Cloud to train its ‘Apple Foundation Model’, a large language model powering its AI initiatives. This was one of the first high-profile adoptions outside Google’s ecosystem.
Ironwood is specifically optimised to reduce data movement and on-chip latency during large-scale tensor operations. As the scale of these models exceeds the capacity of a single chip, Ironwood TPUs are equipped with a low-latency, high-bandwidth Interconnect (ICI) network, enabling tightly coordinated, synchronous communication across the entire TPU pod.
The TPU supports two configurations, one with 256 chips and another with 9,216 chips. The full-scale version delivers 42.5 exaflops of compute, over 24 times the performance of the El Capitan supercomputer, which offers 1.7 exaflops per pod. Each Ironwood chip provides 4,614 TFLOPs of peak compute.
According to Google, Ironwood is nearly twice as power-efficient as Trillium and almost 30 times more efficient than its first Cloud TPU launched in 2018. Liquid cooling enables consistent performance under sustained load, addressing the energy constraints associated with large-scale AI.
Why Google Loves TPUs?
It’s unfortunate that Google doesn’t offer TPUs as a standalone product. “Google should spin out its TPU team into a separate business, retain a big stake, and have it go public. Easy peasy way to make a bazillion dollars,” said Erik Bernhardsson, founder of Modal Labs.
If Google starts selling TPUs, it will definitely see strong market demand. These chips are capable of training models, too. For instance, Google used Trillium TPUs to train Gemini 2.0, and now, both enterprises and startups can take advantage of the same powerful and efficient infrastructure.
Interestingly, TPUs were originally developed for Google’s own AI-driven services, including Google Search, Google Translate, Google Photos, and YouTube.
A recent report says Google might team up with MediaTek to build its next-gen TPUs. One reason behind this move could be MediaTek’s close ties with TSMC, which offers Google lower chip costs than Broadcom.
Notably, earlier this year, Google announced an investment of $75 billion in capital expenditures for 2025.
In the latest earnings call, Google’s CFO Anat Ashkenazi admitted to benefitting from having TPUs when they invest capital in building data centres. “Our strategy is to lean mostly on our own data centers, which means they are more customised to our needs. Our TPUs are customised for our workloads and needs. So, it does allow us to be more efficient and productive with that investment and spend,” she said.
Google reportedly spent between $6 billion and $9 billion on TPUs in the past year, based on estimates from research firm Omdia. Despite its investment in custom chips, Google remains a major NVIDIA customer.
According to a recent report, the search giant is in advanced discussions to lease NVIDIA’s Blackwell chips from CoreWeave, a rising player in the cloud computing space. This proves that even top NVIDIA clients like Google are facing difficulty in securing enough chips to satisfy the growing demand from their users.
Moreover, integrating GPUs from others like NVIDIA isn’t easy either — cloud providers have to rework their infrastructure. In a recent interaction with AIM, Karan Batta, senior vice president, Oracle Cloud Infrastructure (OCI), said that most centres are not ready for liquid cooling, acknowledging the complexity of managing the heat produced by the new generation of NVIDIA Blackwell GPUs.
He added that cloud providers must choose between passive or active cooling, full-loop systems, or sidecar approaches to integrate liquid cooling effectively. Batta further noted that while server racks follow a standard design (and can be copied from NVIDIA’s setup), the real complexity lies in data centre design and networking.
Not to forget, Oracle is under pressure to finish building a data center in Abilene, Texas — roughly the size of 17 football fields — for OpenAI. Right now, the facility is incomplete and sitting empty. If delays continue, OpenAI could walk away from the deal, potentially costing Oracle billions.
AWS is Following Suit Too
Much like Google, AWS is building its own chips too. At AWS re:Invent in Las Vegas, the cloud giant announced several new chips, including Trainium2, Graviton4, and Inferentia.
Last year, AWS invested $4 billion in Anthropic, becoming its primary cloud provider and training partner. The company also introduced Trn2 UltraServers and its next-generation Trainium3 AI training chip.
AWS is now working with Anthropic on Project Rainier — a large AI compute cluster powered by thousands of Trainium2 chips. This setup will help Anthropic develop its models and optimise its flagship product, Claude, to run efficiently on Trainium2 hardware.
Ironwood isn’t the only player in the inference space. A number of companies are now competing for NVIDIA’s chip market share, including AI chip startups like Groq, Cerebras Systems, and SambaNova Systems.
At the same time, OpenAI is progressing in its plan to develop custom AI chips to reduce its reliance on NVIDIA. According to a report, the company is preparing to finalise the design of its first in-house chip in the coming months and intends to send it for fabrication at TSMC.