LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
Yang, Zhenyu, Zhang, Kairui, Hu, Yuhang, Wang, Bing, Qian, Shengsheng, Wen, Bin, Yang, Fan, Gao, Tingting, Dong, Weiming, Xu, Changsheng
–arXiv.org Artificial Intelligence
Despite significant progress in Video Large Language Models (Video-LLMs) for offline video understanding, existing online Video-LLMs typically struggle to simultaneously process continuous frame-by-frame inputs and determine optimal response timing, often compromising real-time responsiveness and narrative coherence. To address these limitations, we introduce LiveStar, a pioneering live streaming assistant that achieves always-on proactive responses through adaptive streaming decoding. Specifically, LiveStar incorporates: (1) a training strategy enabling incremental video-language alignment for variable-length video streams, preserving temporal consistency across dynamically evolving frame sequences; (2) a response-silence decoding framework that determines optimal proactive response timing via a single forward pass verification; (3) memory-aware acceleration via peak-end memory compression for online inference on 10+ minute videos, combined with streaming key-value cache to achieve 1.53x faster inference. We also construct an OmniStar dataset, a comprehensive dataset for training and benchmarking that encompasses 15 diverse real-world scenarios and 5 evaluation tasks for online video understanding. Extensive experiments across three benchmarks demonstrate LiveStar's state-of-the-art performance, achieving an average 19.5% improvement in semantic correctness with 18.1% reduced timing difference compared to existing online Video-LLMs, while improving FPS by 12.0% across all five OmniStar tasks. Our model and dataset can be accessed at https://github.com/yzy-bupt/LiveStar.
arXiv.org Artificial Intelligence
Nov-10-2025
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology (0.46)
- Leisure & Entertainment (0.67)
- Media (0.46)
- Technology: