StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following

Li, Jinnan, Li, Jinzhe, Wang, Yue, Chang, Yi, Wu, Yuan

Feb-20-2025–arXiv.org Artificial Intelligence

Multi-turn instruction following capability constitutes a core competency of large language models (LLMs) in real-world applications. Existing evaluation benchmarks predominantly focus on fine-grained constraint satisfaction and domain-specific capability assessment, yet overlook the crucial structural dependency between dialogue turns that distinguishes multi-turn from single-turn interactions. This structural dependency not only reflects user intent but also establishes a second dimension for instruction following evaluation beyond constraint satisfaction. To address this gap, we propose StructFlowBench, a multi-turn instruction following benchmark with structural flow modeling. The benchmark innovatively defines a structural flow framework comprising six fundamental inter-turn relationships, which not only introduces novel structural constraints for model evaluation but also serves as generation parameters for creating customized dialogue flows tailored to specific scenarios. Adopting established LLM-based automatic evaluation methodologies, we conduct systematic evaluations of 13 leading open-source and closed-source LLMs. Experimental results reveal significant deficiencies in current models' comprehension of multi-turn dialogue structures. The code is available at \url{https://github.com/MLGroupJLU/StructFlowBench}.

constraint, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Feb-20-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Mexico > Mexico City (0.14)
  - United States (0.28)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)