GenRL: Multimodal-foundation world models for generalization in embodied agents

May-26-2025, 20:32:17 GMT–Neural Information Processing Systems

Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning (RL) is hard to scale up as it requires a complex reward design for each task. In contrast, language can specify tasks in a more natural way. Current foundation vision-language models (VLMs) generally require fine-tuning or other adaptations to be adopted in embodied contexts, due to the significant domain gap. However, the lack of multimodal data in such domains represents an obstacle to developing foundation models for embodied applications.

artificial intelligence, generalization, world model, (5 more...)

Neural Information Processing Systems

May-26-2025, 20:32:17 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (0.64)
  - Cognitive Science > Problem Solving (0.53)