PPTArena: A Benchmark for Agentic PowerPoint Editing
Ofengenden, Michael, Man, Yunze, Pang, Ziqi, Wang, Yu-Xiong
–arXiv.org Artificial Intelligence
W e introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast to image-PDF renderings or text-to-slide generation, PPTArena focuses on in-place editing across 100 decks, 2,125 slides, and over 800 targeted edits covering text, charts, tables, animations, and master-level styles. Each case includes a ground-truth deck, a fully specified target outcome, and a dual VLM-as-judge pipeline that separately scores instruction following and visual quality using both structural diffs and slide images. Building on this setting, we propose PPTPilot, a structure-aware slide-editing agent that plans semantic edit sequences, routes between high-level programmatic tools and deterministic XML operations for precise control, and verifies outputs through an iterative plan-edit-check loop against task-specific constraints. In our experiments, PPTPilot outperforms strong proprietary agents and frontier VLM systems by over 10 percentage points on compound, layout-sensitive, and cross-slide edits, with particularly large gains in visual fidelity and deck-wide consistency. Despite these improvements, existing agents still underperform on long-horizon, document-scale tasks in PPTArena, highlighting the remaining challenges in reliable PPT editing.
arXiv.org Artificial Intelligence
Dec-9-2025
- Country:
- Asia > Thailand
- Europe
- Austria > Vienna (0.14)
- Estonia > Harju County
- Tallinn (0.04)
- North America > United States
- California > Alameda County
- Berkeley (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Illinois > Champaign County
- Urbana (0.04)
- California > Alameda County
- South America > Peru
- Loreto Department (0.04)
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Information Technology (0.46)
- Technology: