RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation

Jun-13-2026, 20:52:13 GMT–Neural Information Processing Systems

Recent advances in vision-language models (VLMs) have enabled instruction-conditioned robotic systems with improved generalization. However, most existing work focuses on reactive System 1 policies, underutilizing VLMs' strengths in semantic reasoning and long-horizon planning. These System 2 capabilities--characterized by deliberative, goal-directed thinking--remain underexplored due to the limited temporal scale and structural complexity of current benchmarks. To address this gap, we introduce RoboCerebra, a benchmark for evaluating high-level reasoning in long-horizon robotic manipulation. RoboCerebra includes: (1) a large-scale simulation dataset with extended task horizons and diverse subtask sequences in household environments; (2) a hierarchical framework combining a high-level VLM planner with a low-level vision-language-action (VLA) controller; and (3) an evaluation protocol targeting planning, reflection, and memory through structured System 1–System 2 interaction.

artificial intelligence, benchmark, system 2, (5 more...)

Neural Information Processing Systems

Jun-13-2026, 20:52:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Robots (0.92)