Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models
Wang, Guoyan, Huang, Yanyan, Chen, Chunlin, Wang, Lifeng, Sun, Yuxiang
–arXiv.org Artificial Intelligence
Cross-platform strategy game automation remains a challenge due to diverse user interfaces and dynamic battlefield environments. Existing Vision--Language Models (VLMs) struggle with generalization across heterogeneous platforms and lack precision in interface understanding and action execution. We introduce Yanyun-3, a VLM-based agent that integrates Qwen2.5-VL for visual reasoning and UI-TARS for interface execution. We propose a novel data organization principle -- combination granularity -- to distinguish intra-sample fusion and inter-sample mixing of multimodal data (static images, multi-image sequences, and videos). The model is fine-tuned using QLoRA on a curated dataset across three strategy game platforms. The optimal strategy (M*V+S) achieves a 12.98x improvement in BLEU-4 score and a 63% reduction in inference time compared to full fusion. Yanyun-3 successfully executes core tasks (e.g., target selection, resource allocation) across platforms without platform-specific tuning. Our findings demonstrate that structured multimodal data organization significantly enhances VLM performance in embodied tasks. Yanyun-3 offers a generalizable framework for GUI automation, with broader implications for robotics and autonomous systems.
arXiv.org Artificial Intelligence
Nov-26-2025
- Country:
- Asia
- China
- Jiangsu Province > Nanjing (0.04)
- Sichuan Province > Chengdu (0.04)
- Japan (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Dubai Emirate > Dubai (0.04)
- Singapore (0.14)
- China
- Europe
- North America
- Canada > British Columbia
- Vancouver (0.04)
- United States
- Arizona > Pima County
- Tucson (0.04)
- California > San Francisco County
- San Francisco (0.14)
- Florida > Miami-Dade County
- Miami (0.14)
- Rhode Island > Newport County
- Newport (0.04)
- Arizona > Pima County
- Canada > British Columbia
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government > Military (1.00)
- Leisure & Entertainment > Games
- Computer Games (0.92)
- Technology:
- Information Technology
- Artificial Intelligence
- Cognitive Science (1.00)
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning > Agents (1.00)
- Robots (1.00)
- Vision (1.00)
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology