PPTArena: A Benchmark for Agentic PowerPoint Editing

Ofengenden, Michael, Man, Yunze, Pang, Ziqi, Wang, Yu-Xiong

Dec-9-2025–arXiv.org Artificial Intelligence

W e introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast to image-PDF renderings or text-to-slide generation, PPTArena focuses on in-place editing across 100 decks, 2,125 slides, and over 800 targeted edits covering text, charts, tables, animations, and master-level styles. Each case includes a ground-truth deck, a fully specified target outcome, and a dual VLM-as-judge pipeline that separately scores instruction following and visual quality using both structural diffs and slide images. Building on this setting, we propose PPTPilot, a structure-aware slide-editing agent that plans semantic edit sequences, routes between high-level programmatic tools and deterministic XML operations for precise control, and verifies outputs through an iterative plan-edit-check loop against task-specific constraints. In our experiments, PPTPilot outperforms strong proprietary agents and frontier VLM systems by over 10 percentage points on compound, layout-sensitive, and cross-slide edits, with particularly large gains in visual fidelity and deck-wide consistency. Despite these improvements, existing agents still underperform on long-horizon, document-scale tasks in PPTArena, highlighting the remaining challenges in reliable PPT editing.

benchmark, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

Dec-9-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.92)
- Europe > Austria
  - Vienna (0.14)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology
  - Communications (0.93)
  - Information Management (0.90)
  - Artificial Intelligence
    - Representation & Reasoning > Agents (0.93)
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (0.71)
    - Machine Learning > Neural Networks
      - Deep Learning (0.96)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found