DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Peng, Yi-Hao, Huq, Faria, Jiang, Yue, Wu, Jason, Li, Amanda Xin Yue, Bigham, Jeffrey, Pavel, Amy

Sep-30-2024–arXiv.org Artificial Intelligence

Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with target labels using code generation. Our method allows people to create datasets with built-in labels and train models with a small number of human-annotated examples. We demonstrate performance improvements in three tasks for understanding slides and UIs: recognizing visual elements, describing visual content, and classifying visual content types.

dataset, dreamstruct, proceedings, (13 more...)

arXiv.org Artificial Intelligence

Sep-30-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Texas > Travis County
    - Austin (0.04)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - New York > New York County
    - New York City (0.05)
- Asia > Middle East
  - Israel > Tel Aviv District > Tel Aviv (0.04)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Health & Medicine (0.69)
- Education (0.46)

Technology:
- Information Technology
  - Human Computer Interaction > Interfaces (1.00)
  - Communications (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language > Large Language Model (0.71)
    - Machine Learning > Neural Networks
      - Deep Learning (0.47)