New Robotics Method AnyPlace Achieves Object Placement Through VLMs, Synthetic Data

1 month ago 20
  • Published on March 4, 2025
  • In AI News

This advancement addresses the challenges of object placement, which is often difficult due to variations in object shapes and placement arrangements.

Researchers have introduced a new two-stage method for robotic object placement called AnyPlace, which demonstrates the ability to predict feasible placement poses. This advancement addresses the challenges of object placement, which is often difficult due to variations in object shapes and placement arrangements. 

According to Animesh Garg, one of the researchers from Georgia Institute of Technology, the work addresses the challenge of robot placement, focusing on the generalisability of solutions rather than domain-specific ones.

How can robots reliably place objects in diverse real-world tasks?
🤖🔍 Placement is tough—objects vary in shape and placement modes (such as stacking, hanging, and insertion), making it a challenging problem.
We introduce AnyPlace, a two-stage method trained purely on synthetic… pic.twitter.com/BR8Xhwuz7Z

— Animesh Garg (@animesh_garg) February 24, 2025

The system uses a vision language model (VLM) to produce potential placement locations, combined with depth-based models for geometric placement prediction. 

“Our AnyPlace pipeline consists of two stages: high-level placement position prediction and low-level pose prediction,” the researcher paper stated.

The first stage uses Molmo, a VLM, and SAM 2, a large segmentation model, to segment objects and propose placement locations. Only the region around the proposed placement is fed into the low-level pose prediction model, which uses point clouds of objects to be placed and regions of placement locations.

Our key insight is that by leveraging a Vision-Language Model (VLM) to identify rough placement locations, we focus only on the relevant regions for local placement, which enables us to train the low-level placement-pose-prediction model to capture diverse placements efficiently. pic.twitter.com/WcAd0t2zNX

— Animesh Garg (@animesh_garg) February 24, 2025

Synthetic Data Generation 

The creators of AnyPlace have developed a fully synthetic dataset of 1,489 randomly generated objects, covering insertion, stacking, and hanging. In total, 13 categories were created, and 5,370 placement poses were generated, as per the paper

This approach helps overcome limitations of real-world data collection, enabling the model to generalise across objects and scenarios.

Garg noted that for object placement, it is possible to generate highly effective synthetic data, allowing for the creation of a grasp predictor for any object using only synthetic data.

To generalize across objects & placements, we generate a fully synthetic dataset with:
✅ Randomly generated objects in Blender
✅ Diverse placement configurations (stacking, insertion, hanging) in IsaacSim
This allows us to train our model without real-world data collection! 🚀 pic.twitter.com/p6sIiumk8n

— Animesh Garg (@animesh_garg) February 24, 2025

“The use of depth data minimises the sim-to-real gap, making the model applicable in real-world scenarios with limited real-world data collection,” Garg noted. The synthetic data generation process creates variability in object shapes and sizes, improving the model’s robustness.

The model achieved an 80% success rate on the vial insertion task, showing robustness and generalisation. The simulation results outperform baselines in success rates, coverage of placement modes and fine-placement precision.

For real-world results, the method transfers directly from synthetic to real-world tasks, “succeeding where others struggle”.

How well does AnyPlace perform?
🏆 Simulation results: Outperforms baselines in
✔ Success rate
✔ Coverage of placement modes
✔ Fine-placement precision
📌 Real-world results: Our method transfers directly from synthetic to real-world tasks, succeeding where others struggle! pic.twitter.com/jIRTApGWxN

— Animesh Garg (@animesh_garg) February 24, 2025

Another recently released research introduces Phantom, a method to train robot policies without collecting any robot data and using only human video demonstrations.

Phantom turns human videos into “robot” demonstrations, making it significantly easier to scale up and diversify robotics data.

Picture of Sanjana Gupta

Sanjana Gupta

An information designer who loves to learn about and try new developments in the field of tech and AI. She likes to spend her spare time reading and exploring absurdism in literature.

Association of Data Scientists

GenAI Corporate Training Programs

India's Biggest Women in Tech Summit

March 20 and 21, 2025 | 📍 NIMHANS Convention Center, Bengaluru

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Rising 2025 Women in Tech & AI

March 20 - 21, 2025 | 📍 NIMHANS Convention Center, Bengaluru

AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru, India

Data Engineering Summit 2025

May 15 - 16, 2025 | 📍 Hotel Radisson Blu, Bengaluru

MachineCon GCC Summit 2025

June 20 to 22, 2025 | 📍 ITC Grand, Goa

Sep 17 to 19, 2025 | 📍KTPO, Whitefield, Bengaluru, India

India's Biggest Developers Summit Feb, 2025 | 📍Nimhans Convention Center, Bengaluru

Read Entire Article