FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents

Li, Bobo, Wang, Yuheng, Fei, Hao, Li, Juncheng, Ji, Wei, Lee, Mong-Li, Hsu, Wynne

Jun-3-2025–arXiv.org Artificial Intelligence

Online form filling is a common yet labor-intensive task involving extensive keyboard and mouse interactions. Despite the long-standing vision of automating this process with "one click", existing tools remain largely rule-based and lack generalizable, generative capabilities. Recent advances in Multimodal Large Language Models (MLLMs) have enabled promising agents for GUI-related tasks in general-purpose scenarios. However, they struggle with the unique challenges of form filling, such as flexible layouts and the difficulty of aligning textual instructions with on-screen fields. To bridge this gap, we formally define the form-filling task and propose FormFactory, an interactive benchmarking suite comprising a web-based interface, backend evaluation module, and carefully constructed dataset. Our benchmark covers diverse real-world scenarios, incorporates various field formats, and simulates high-fidelity form interactions. We conduct a comprehensive evaluation of state-of-the-art MLLMs and observe that no model surpasses 5% accuracy, underscoring the inherent difficulty of the task. These findings also reveal significant limitations in current models' visual layout reasoning and field-value alignment abilities. We hope our benchmark can serve as a stepping stone for further research into robust, practical form-filling agents.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jun-3-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- North America > United States (0.16)

Genre:
- Instructional Material (0.68)
- Research Report (0.50)

Industry:
- Banking & Finance (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found