GoalLadder: Incremental Goal Discovery with Vision-Language Models

Jun-13-2026, 20:12:06 GMT–Neural Information Processing Systems

Natural language can offer a concise and human-interpretable means of specifying reinforcement learning (RL) tasks. The ability to extract rewards from a language instruction can enable the development of robotic systems that can learn from human guidance; however, it remains a challenging problem, especially in visual environments. Existing approaches that employ large, pretrained language models either rely on non visual environment representations, require prohibitively large amounts of feedback, or generate noisy, ill shaped reward functions. In this paper, we propose a novel method, GoalLadder, that leverages vision-language models (VLMs) to train RL agents from a single language instruction in visual environments. GoalLadder works by incrementally discovering states that bring the agent closer to completing a task specified in natural language.

machine learning, natural language, reinforcement learning, (10 more...)

Neural Information Processing Systems

Jun-13-2026, 20:12:06 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Reinforcement Learning (0.59)