Google DeepMind Unveils Inference Time Scaling for Diffusion Models

3 months ago 36

Published on January 17, 2025
In AI News

‘Inference-time scaling for LLMs drastically improves the model's ability in many perspectives, but what about diffusion models?’

Google DeepMind, the AI research arm of Google, in collaboration with the Massachusetts Institute of Technology (MIT) and New York University (NYU), has published a new study that introduces inference time scaling for diffusion models.

The research titled ‘Inference-Time Scaling for Diffusion Models Beyond Scaling Denoising Steps’ explores the impact of providing additional computing resources to image generation models while they generate results.

Diffusion models begin the process of ‘pure noise’ and require multiple steps of denoising to obtain clean outputs based on the input. “In this work, we explore the inference-time scaling behaviour of diffusion models beyond increasing denoising steps and investigating how the generation performance can further improve with increased computation,” the authors said.

The research found that increasing inference time compute leads to ‘substantial improvements’ in the quality of the samples generated. Check out the detailed technical report of the research to understand the nitty-gritty details of the components and the techniques used.

One of the researchers, Nanye Ma, said that the research found improvements when better-starting noise is searched for. “This suggests pushing the inference-time scaling limit by investing compute in searching for better noises,” he said on X.

“Our search framework consists of two components: verifiers to provide feedback and algorithms to find better noise candidates,” he added.

The research compared the effectiveness of inference-time search methods across different models and showed that small models with search can outperform larger ones without search.

“These results indicate that substantial training costs can be partially offset by modest inference-time compute, enabling higher-quality samples more efficiently,” said Ma.

Inference time compute is a concept that has been widely used in large language models, especially in OpenAI’s o1 reasoning model.

“By allocating more compute during inference, often through sophisticated search processes, these works show that LLMs can produce higher-quality and more contextually appropriate responses,” said the authors of the paper, indicating their motivation to apply these techniques to diffusion models.

As demonstrated by Google DeepMind and others, this seems to hold true for diffusion models as well. Saining Xie, one of the authors, said that he was blown away by diffusion models’ natural ability to scale during inference. “You train them with fixed flops, but during test time, you can ramp it up by [around] 1,000 times,” he said on X.

While the research mostly focuses on image generation tasks, and evaluates them on text-to-image generation benchmarks, it will be hard for OpenAI to beat Google if these techniques can extend to video generation as well. Google’s Veo 2 model outperforms OpenAI’s Sora both in terms of quality and prompt adherence.

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.