Score the Steps, Not Just the Goal: VLM-Based Subgoal Evaluation for Robotic Manipulation

ElMallah, Ramy, Chhajer, Krish, Lee, Chi-Guhn

Sep-25-2025–arXiv.org Artificial Intelligence

Robot learning papers typically report a single binary success rate (SR), which obscures where a policy succeeds or fails along a multi-step manipulation task. We argue that subgoal-level reporting should become routine: for each trajectory, a vector of per-subgoal SRs that makes partial competence visible (e.g., grasp vs. pour). We propose a blueprint for StepEval, a cost-aware plug-in evaluation framework that utilizes vision-language models (VLMs) as automated judges of subgoal outcomes from recorded images or videos. Rather than proposing new benchmarks or APIs, our contribution is to outline design principles for a scalable, community-driven open-source project. In StepEval, the primary artifact for policy evaluation is the per-subgoal SR vector; however, other quantities (e.g., latency or cost estimates) are also considered for framework-optimization diagnostics to help the community tune evaluation efficiency and accuracy when ground-truth subgoal success labels are available. We discuss how such a framework can remain model-agnostic, support single- or multi-view inputs, and be lightweight enough to adopt across labs. The intended contribution is a shared direction: a minimal, extensible seed that invites open-source contributions, so that scoring the steps, not just the final goal, becomes a standard and reproducible practice.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Sep-25-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Japan > Shikoku
    - Kagawa Prefecture > Takamatsu (0.04)
  - South Korea > Seoul
    - Seoul (0.04)
- North America > Canada
  - Ontario > Toronto (0.15)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.70)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found