BiFold: Bimanual Cloth Folding with Language Guidance
Barbany, Oriol, Colomé, Adrià, Torras, Carme
–arXiv.org Artificial Intelligence
Cloth folding is a complex task due to the inevitable self-occlusions of clothes, their complicated dynamics, and the disparate materials, geometries, and textures that garments can have. In this work, we learn folding actions conditioned on text commands. Translating high-level, abstract instructions into precise robotic actions requires sophisticated language understanding and manipulation capabilities. To do that, we leverage a pre-trained vision-language model and repurpose it to predict manipulation actions. Our model, BiFold, can take context into account and achieves state-of-the-art performance on an existing language-conditioned folding benchmark. Given the lack of annotated bimanual folding data, we devise a procedure to automatically parse actions of a simulated dataset and tag them with aligned text instructions. BiFold attains the best performance on our dataset and can transfer to new instructions, garments, and environments.
arXiv.org Artificial Intelligence
Jan-27-2025
- Country:
- Europe > Netherlands (0.14)
- Genre:
- Research Report (0.81)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Robots (1.00)
- Vision (1.00)
- Human Computer Interaction > Interfaces
- Virtual Reality (0.46)
- Artificial Intelligence
- Information Technology