OdysseyBench: Evaluating LLM Agents on Long-Horizon Complex Office Application Workflows

Open in new window