Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI
Wu, Jiangkai, Ren, Zhiyuan, Liu, Liming, Zhang, Xinggong
–arXiv.org Artificial Intelligence
AI Video Chat emerges as a new paradigm for Real-time Communication (RTC), where one peer is not a human, but a Multimodal Large Language Model (MLLM). This makes interaction between humans and AI more intuitive, as if chatting face-to-face with a real person. However, this poses significant challenges to latency, because the MLLM inference takes up most of the response time, leaving very little time for video streaming. Due to network uncertainty, transmission latency becomes a critical bottleneck preventing AI from being like a real person. To address this, we call for AI-oriented RTC research, exploring the network requirement shift from "humans watching video" to "AI understanding video". We begin by recognizing the main differences between AI Video Chat and traditional RTC. Then, through prototype measurements, we identify that ultra-low bitrate is a key factor for low latency. To reduce bitrate dramatically while maintaining MLLM accuracy, we propose Context-Aware Video Streaming that recognizes the importance of each video region for chat and allocates bitrate almost exclusively to chat-important regions. To evaluate the impact of video streaming quality on MLLM accuracy, we build the first benchmark, named Degraded Video Understanding Benchmark (DeViBench). Finally, we discuss some open questions and ongoing solutions for AI Video Chat. DeViBench is open-sourced at: https://github.com/pku-netvideo/DeViBench.
arXiv.org Artificial Intelligence
Nov-25-2025
- Country:
- Asia > China (0.04)
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Ireland > Leinster
- North America > United States
- Maryland > Prince George's County
- College Park (0.05)
- New York > New York County
- New York City (0.04)
- Maryland > Prince George's County
- Genre:
- Research Report (1.00)
- Technology: