Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding

Omasa, Takamitsu, Koshihara, Ryo, Morishige, Masumi

May-14-2025–arXiv.org Artificial Intelligence

Flowcharts are indispensable tools in software design and business-process analysis, yet current Vision Language Models (VLMs) frequently misinterpret the directional arrows and graph topology that set these diagrams apart from natural images. This paper introduces a seven-stage pipeline, grouped into three broader processes--(1) arrow-aware detection of nodes and arrow endpoints; (2) Optical Character Recognition (OCR) to extract node text; and (3) construction of a structured prompt that guides the VLMs. Tested on a 90-question benchmark distilled from 30 annotated flowcharts, our method raises overall accuracy from 80% to 89% (+9 pp), a sizeable and statistically significant gain achieved without task-specific fine-tuning of the VLMs. The benefit is most pronounced for next-step queries (25/30 30/30; 100%, +17 pp); branch-result questions improve more modestly, and before-step queries remain difficult. A parallel evaluation with an LLM-as-a-Judge protocol shows the same trends, reinforcing the advantage of explicit arrow encoding. Limitations include dependence on detector and OCR precision, the small evaluation set, and residual errors at nodes with multiple incoming edges. Future work will enlarge the benchmark with synthetic and handwritten flowcharts and assess the approach on Business Process Model and Notation (BPMN) and Unified Modeling Language (UML).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

May-14-2025

arXiv.org PDF

Add feedback

Genre:
- Workflow (1.00)
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found