ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning
Shao, Jie-Jing, Yang, Xiao-Wen, Zhang, Bo-Wen, Chen, Baizhi, Wei, Wen-Da, Cai, Guohao, Dong, Zhenhua, Guo, Lan-Zhe, Li, Yu-feng
–arXiv.org Artificial Intelligence
Recent advances in LLMs, particularly in language reasoning and tool integration, have rapidly sparked the real-world development of Language Agents. Among these, travel planning represents a prominent domain, combining academic challenges with practical value due to its complexity and market demand. However, existing benchmarks fail to reflect the diverse, real-world requirements crucial for deployment. To address this gap, we introduce ChinaTravel, a benchmark specifically designed for authentic Chinese travel planning scenarios. We collect the travel requirements from questionnaires and propose a compositionally generalizable domain-specific language that enables a scalable evaluation process, covering feasibility, constraint satisfaction, and preference comparison. Empirical studies reveal the potential of neuro-symbolic agents in travel planning, achieving a constraint satisfaction rate of 27.9%, significantly surpassing purely neural models at 2.6%. Moreover, we identify key challenges in real-world travel planning deployments, including open language reasoning and unseen concept composition. These findings highlight the significance of ChinaTravel as a pivotal milestone for advancing language agents in complex, real-world planning scenarios.
arXiv.org Artificial Intelligence
Dec-20-2024
- Country:
- Asia > China
- Beijing > Beijing (0.06)
- Shanghai > Shanghai (0.05)
- Jiangsu Province > Nanjing (0.05)
- Zhejiang Province > Hangzhou (0.04)
- Sichuan Province > Chengdu (0.04)
- Hubei Province > Wuhan (0.04)
- Chongqing Province > Chongqing (0.04)
- Guangdong Province
- Asia > China
- Genre:
- Research Report (0.82)
- Industry:
- Consumer Products & Services > Travel (1.00)