New Robotics Method AnyPlace Achieves Object Placement Through VLMs, Synthetic Data

1 month ago 20

Published on March 4, 2025
In AI News

This advancement addresses the challenges of object placement, which is often difficult due to variations in object shapes and placement arrangements.

Researchers have introduced a new two-stage method for robotic object placement called AnyPlace, which demonstrates the ability to predict feasible placement poses. This advancement addresses the challenges of object placement, which is often difficult due to variations in object shapes and placement arrangements.

According to Animesh Garg, one of the researchers from Georgia Institute of Technology, the work addresses the challenge of robot placement, focusing on the generalisability of solutions rather than domain-specific ones.

How can robots reliably place objects in diverse real-world tasks?
🤖🔍 Placement is tough—objects vary in shape and placement modes (such as stacking, hanging, and insertion), making it a challenging problem.
We introduce AnyPlace, a two-stage method trained purely on synthetic… pic.twitter.com/BR8Xhwuz7Z

— Animesh Garg (@animesh_garg) February 24, 2025

The system uses a vision language model (VLM) to produce potential placement locations, combined with depth-based models for geometric placement prediction.

“Our AnyPlace pipeline consists of two stages: high-level placement position prediction and low-level pose prediction,” the researcher paper stated.

The first stage uses Molmo, a VLM, and SAM 2, a large segmentation model, to segment objects and propose placement locations. Only the region around the proposed placement is fed into the low-level pose prediction model, which uses point clouds of objects to be placed and regions of placement locations.

Our key insight is that by leveraging a Vision-Language Model (VLM) to identify rough placement locations, we focus only on the relevant regions for local placement, which enables us to train the low-level placement-pose-prediction model to capture diverse placements efficiently. pic.twitter.com/WcAd0t2zNX

— Animesh Garg (@animesh_garg) February 24, 2025

Synthetic Data Generation

The creators of AnyPlace have developed a fully synthetic dataset of 1,489 randomly generated objects, covering insertion, stacking, and hanging. In total, 13 categories were created, and 5,370 placement poses were generated, as per the paper.

This approach helps overcome limitations of real-world data collection, enabling the model to generalise across objects and scenarios.

Garg noted that for object placement, it is possible to generate highly effective synthetic data, allowing for the creation of a grasp predictor for any object using only synthetic data.

To generalize across objects & placements, we generate a fully synthetic dataset with:
✅ Randomly generated objects in Blender
✅ Diverse placement configurations (stacking, insertion, hanging) in IsaacSim
This allows us to train our model without real-world data collection! 🚀 pic.twitter.com/p6sIiumk8n

— Animesh Garg (@animesh_garg) February 24, 2025

“The use of depth data minimises the sim-to-real gap, making the model applicable in real-world scenarios with limited real-world data collection,” Garg noted. The synthetic data generation process creates variability in object shapes and sizes, improving the model’s robustness.

The model achieved an 80% success rate on the vial insertion task, showing robustness and generalisation. The simulation results outperform baselines in success rates, coverage of placement modes and fine-placement precision.

For real-world results, the method transfers directly from synthetic to real-world tasks, “succeeding where others struggle”.

How well does AnyPlace perform?
🏆 Simulation results: Outperforms baselines in
✔ Success rate
✔ Coverage of placement modes
✔ Fine-placement precision
📌 Real-world results: Our method transfers directly from synthetic to real-world tasks, succeeding where others struggle! pic.twitter.com/jIRTApGWxN

— Animesh Garg (@animesh_garg) February 24, 2025

Another recently released research introduces Phantom, a method to train robot policies without collecting any robot data and using only human video demonstrations.

Phantom turns human videos into “robot” demonstrations, making it significantly easier to scale up and diversify robotics data.

Sanjana Gupta

An information designer who loves to learn about and try new developments in the field of tech and AI. She likes to spend her spare time reading and exploring absurdism in literature.