GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs

Hua, Pu, Liu, Minghuan, Macaluso, Annabella, Lin, Yunfeng, Zhang, Weinan, Xu, Huazhe, Wang, Lirui

Oct-4-2024–arXiv.org Artificial Intelligence

Robot learning requires large amounts of interaction data and evaluation, which are expensive to acquire at scale in the real world. Robot simulation holds the promise of providing such data and verification in high diversity and efficiency across objects, tasks, and scenes. While the ability to simulate has led to many successes in AI across Gaming, Go, and Mathematical Proofs [2, 3, 4], there are two requirements for such a path to be successful in robotics: The data needs to scale in complexity without significant human efforts and the data needs to be realistic enough to transfer to the real world. Previous works [5, 6, 7, 8, 9, 10, 11] have made significant progress in scalable simulation benchmarks in robotics and training policies on the simulation data. Foundation models [12], particularly generative models pre-trained on internet-scale data [13, 14, 15], have demonstrated impressive capabilities required for generating robot simulation tasks, such as coding [16], spatial reasoning [17], task semantics [9], planning [18, 19], video prediction[20, 21], and cost and reward understanding [22, 23]. While foundation models have shown impressive capabilities to output actions to solve robotic tasks directly in the real world [24], simulation provides a low-cost and scalable platform to learn robust end-to-end policies.

keypoint name, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

Oct-4-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.67)

Industry:
- Information Technology (0.93)
- Leisure & Entertainment > Games
  - Computer Games (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Robots (1.00)