CORE: Full-Path Evaluation of LLM Agents Beyond Final State

Open in new window