ClevrSkills: Compositional Language And Visual Reasoning in Robotics

Mar-20-2026, 00:18:17 GMT–Neural Information Processing Systems

Robotics tasks are highly compositional by nature. For example, to perform a high-level task like cleaning the table a robot must employ low-level capabilities of moving the effectors to the objects on the table, pick them up and then move them off the table one-by-one, while re-evaluating the consequently dynamic scenario in the process. Given that large vision language models (VLMs) have shown progress on many tasks that require high level, human-like reasoning, we ask the question: if the models are taught the requisite low-level capabilities, can they compose them in novel ways to achieve interesting high-level tasks like cleaning the table without having to be explicitly taught so?

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Mar-20-2026, 00:18:17 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Robots (1.00)