VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis
Khezresmaeilzadeh, Tina, Razmara, Parsa, Azizi, Seyedarmin, Sadeghi, Mohammad Erfan, Potraghloo, Erfan Baghaei
–arXiv.org Artificial Intelligence
Stock price prediction remains a complex and high-stakes task in financial analysis, traditionally addressed using statistical models or, more recently, language models. In this work, we introduce VISTA (Vision-Language Inference for Stock Time-series Analysis), a novel, training-free framework that leverages Vision-Language Models (VLMs) for multi-modal stock forecasting. VISTA prompts a VLM with both textual representations of historical stock prices and their corresponding line charts to predict future price values. By combining numerical and visual modalities in a zero-shot setting and using carefully designed chain-of-thought prompts, VISTA captures complementary patterns that unimodal approaches often miss. We benchmark VISTA against standard baselines, including ARIMA and text-only LLM-based prompting methods. Experimental results show that VISTA outperforms these baselines by up to 89.83%, demonstrating the effectiveness of multi-modal inference for stock time-series analysis and highlighting the potential of VLMs in financial forecasting tasks without requiring task-specific training.
arXiv.org Artificial Intelligence
Jun-13-2025
- Country:
- Asia (0.04)
- Europe
- Portugal (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- North America
- Trinidad and Tobago > Trinidad
- United States > California (0.15)
- South America > Chile
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Banking & Finance > Trading (1.00)
- Health & Medicine > Therapeutic Area
- Neurology (0.68)
- Technology: