Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

Xu, Zhixuan, Xu, Kechun, Wang, Yue, Xiong, Rong

arXiv.org Artificial Intelligence 

Abstract-- We focus on the task of language-conditioned object placement, in which a robot should generate placements that satisfy all the spatial relational constraints in language instructions. Previous works based on rule-based language parsing or scene-centric visual representation have restrictions on the form of instructions and reference objects or require large amounts of training data. We propose an object-centric framework that leverages foundation models to ground the reference objects and spatial relations for placement, which is more sample efficient and generalizable. Experiments indicate that our model can achieve a 97.75% success rate of placement with only 0.26M trainable parameters. Object placement is an essential task in human-robot contains only one object in the scene and does not support interaction.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found