Flow2Code: Evaluating Large Language Models for Flowchart-based Code Generation Capability
He, Mengliang, Zeng, Jiayi, Jiang, Yankai, Zhang, Wei, Liu, Zeming, Shi, Xiaoming, Zhou, Aimin
–arXiv.org Artificial Intelligence
While large language models (LLMs) show promise in code generation, existing benchmarks neglect the flowchart-based code generation. To promote further research on flowchart-based code generation, this work presents Flow2Code, a novel benchmark for flowchart-based code generation evaluation. The evaluation dataset spans 15 programming languages and includes 5,622 code segments paired with 16,866 flowcharts of three types: code, UML, and pseudocode. Extensive experiments with 13 multimodal LLMs reveal that current LLMs can not generate code based on flowcharts perfectly. Besides, experiment results show that the supervised fine-tuning technique contributes greatly to the models' performance. We publicly release our code and datasets at https://github.com/hml-github/Flow2Code.
arXiv.org Artificial Intelligence
Jun-4-2025
- Country:
- Asia
- North America > United States
- Florida > Miami-Dade County > Miami (0.04)
- Genre:
- Research Report > New Finding (0.34)
- Workflow (1.00)
- Industry:
- Education (0.67)
- Technology: