Capturing Visualization Design Rationale

Hutchinson, Maeve, Jianu, Radu, Slingsby, Aidan, Wood, Jo, Madhyastha, Pranava

Jul-2-2025–arXiv.org Artificial Intelligence

City St George's, University of London; The Alan T uring InstituteFigure 1: Overview of the structure of our study, showing (A) an example of a student-authored literate visualization notebook, and (B) the ten visualization design concepts used to classify rationale. Prior natural language datasets for data visualization have focused on tasks such as visualization literacy assessment, insight generation, and visualization generation from natural language instructions. These studies often rely on controlled setups with purpose-built visualizations and artificially constructed questions. As a result, they tend to prioritize the interpretation of visualizations, focusing on decoding visualizations rather than understanding their encoding. In this paper, we present a new dataset and methodology for probing visualization design rationale through natural language. We leverage a unique source of real-world visualizations and natural language narratives: literate visualization notebooks created by students as part of a data visualization course. These notebooks combine visual artifacts with design exposition, in which students make explicit the rationale behind their design decisions. We also use large language models (LLMs) to generate and categorize question-answer-rationale triples from the narratives and articulations in the notebooks. This exploration has resulted in the development of a variety of datasets capturing these diverse language related aspects of visualization practice and understanding.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jul-2-2025

arXiv.org PDF

Add feedback

Country:
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)
- Europe > United Kingdom
  - England > Greater London > London (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - United States
    - California
      - San Bernardino County > Redlands (0.04)
      - San Francisco County > San Francisco (0.04)
    - Washington > King County
      - Seattle (0.04)

Genre:
- Research Report (1.00)

Industry:
- Education (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)
  - Natural Language > Large Language Model (1.00)