Vision-LLMs for Spatiotemporal Traffic Forecasting

Yang, Ning, Zhong, Hengyu, Zhang, Haijun, Berry, Randall

Oct-14-2025–arXiv.org Artificial Intelligence

Abstract--Accurate spatiotemporal traffic forecasting is a critical prerequisite for proactive resource management in dense urban mobile networks. While Large Language Models (LLMs) have shown promise in time series analysis, they inherently struggle to model the complex spatial dependencies of grid-based traffic data. Effectively extending LLMs to this domain is challenging, as representing the vast amount of information from dense geographical grids can be inefficient and overwhelm the model's context. T o address these challenges, we propose ST - Vision-LLM, a novel framework that reframes spatiotemporal forecasting as a vision-language fusion problem. Our approach leverages a Vision-LLM visual encoder to process historical global traffic matrices as image sequences, providing the model with a comprehensive global view to inform cell-level predictions. T o overcome the inefficiency of LLMs in handling numerical data, we introduce an efficient encoding scheme that represents floating-point values as single tokens via a specialized vocabulary, coupled with a two-stage numerical alignment fine-tuning process. The model is first trained with Supervised Fine-T uning (SFT) and then further optimized for predictive accuracy using Group Relative Policy Optimization (GRPO), a memory-efficient reinforcement learning method. Evaluations on real-world mobile traffic datasets demonstrate that ST -Vision-LLM outperforms existing methods by 15.6% in long-term prediction accuracy and exceeds the second-best baseline by over 30.04% in cross-domain few-shot scenarios. HE ever-increasing demand for high-speed, reliable mobile connectivity in dense urban environments presents a significant challenge for network operators. Meeting this demand hinges on proactive resource management, for which accurate traffic prediction is a critical prerequisite [1]. The evolution of spatiotemporal sequence prediction has progressed from classical statistical methods to more advanced deep learning approaches. Ning Y ang is with the Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China (e-mail: ning.yang@ia.ac.cn).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-14-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Beijing > Beijing (0.24)
  - Chongqing Province > Chongqing (0.04)
- Europe > Italy
  - Trentino-Alto Adige/Südtirol > Trentino Province (0.14)
- North America
  - Trinidad and Tobago > Trinidad
    - Arima > Arima (0.05)
  - United States > Illinois
    - Cook County > Chicago (0.04)

Genre:
- Research Report (0.50)

Industry:
- Information Technology > Networks (0.48)
- Telecommunications > Networks (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found