StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

Liu, Weiyu, Du, Yilun, Hermans, Tucker, Chernova, Sonia, Paxton, Chris

Apr-25-2023–arXiv.org Artificial Intelligence

Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as "set the table". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.

diffusion model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Apr-25-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)
- Workflow (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Large Language Model (0.68)
  - Representation & Reasoning (1.00)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found