response delay
- North America > United States > New York > Tompkins County > Ithaca (0.05)
- North America > Canada (0.05)
Make a Video Call with LLM: A Measurement Campaign over Five Mainstream Apps
Xu, Jiayang, Huang, Xiangjie, Li, Zijie, Meng, Zili
In 2025, Large Language Model (LLM) services have launched a new feature -- AI video chat -- allowing users to interact with AI agents via real-time video communication (RTC), just like chatting with real people. Despite its significance, no systematic study has characterized the performance of existing AI video chat systems. To address this gap, this paper proposes a comprehensive benchmark with carefully designed metrics across four dimensions: quality, latency, internal mechanisms, and system overhead. Using custom testbeds, we further evaluate five mainstream AI video chatbots with this benchmark. This work provides the research community a baseline of real-world performance and identifies unique system bottlenecks. In the meantime, our benchmarking results also open up several research questions for future optimizations of AI video chatbots.
- Media (0.93)
- Information Technology > Services (0.93)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
RAGServe: Fast Quality-Aware RAG Systems with Configuration Adaptation
Ray, Siddhant, Pan, Rui, Gu, Zhuohan, Du, Kuntai, Ananthanarayanan, Ganesh, Netravali, Ravi, Jiang, Junchen
RAG (Retrieval Augmented Generation) allows LLMs (large language models) to generate better responses with external knowledge, but using more external knowledge often improves generation quality at the expense of response delay. Prior work either reduces the response delay (through better scheduling of RAG queries) or strives to maximize quality (which involves tuning the RAG workflow), but they fall short in optimizing the tradeoff between the delay and quality of RAG responses. This paper presents RAGServe, the first RAG system that jointly schedules queries and adapts the key RAG configurations of each query, such as the number of retrieved text chunks and synthesis methods, in order to balance quality optimization and response delay reduction. Using 4 popular RAG-QA datasets, we show that compared with the state-of-the-art RAG optimization schemes, RAGServe reduces the generation latency by $1.64-2.54\times$ without sacrificing generation quality.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (4 more...)
- Overview (0.67)
- Research Report (0.64)
- Workflow (0.48)
Learn to Compress (LtC): Efficient Learning-based Streaming Video Analytics
Alam, Quazi Mishkatul, Haque, Israat, Abu-Ghazaleh, Nael
Video analytics are often performed as cloud services in edge settings, mainly to offload computation, and also in situations where the results are not directly consumed at the video sensors. Sending high-quality video data from the edge devices can be expensive both in terms of bandwidth and power use. In order to build a streaming video analytics pipeline that makes efficient use of these resources, it is therefore imperative to reduce the size of the video stream. Traditional video compression algorithms are unaware of the semantics of the video, and can be both inefficient and harmful for the analytics performance. In this paper, we introduce LtC, a collaborative framework between the video source and the analytics server, that efficiently learns to reduce the video streams within an analytics pipeline. Specifically, LtC uses the full-fledged analytics algorithm at the server as a teacher to train a lightweight student neural network, which is then deployed at the video source. The student network is trained to comprehend the semantic significance of various regions within the videos, which is used to differentially preserve the crucial regions in high quality while the remaining regions undergo aggressive compression. Furthermore, LtC also incorporates a novel temporal filtering algorithm based on feature-differencing to omit transmitting frames that do not contribute new information. Overall, LtC is able to use 28-35% less bandwidth and has up to 45% shorter response delay compared to recently published state of the art streaming frameworks while achieving similar analytics performance.
- North America > United States > California > Riverside County > Riverside (0.14)
- North America > United States > New York (0.04)
- North America > United States > Wyoming (0.04)
- (3 more...)
- Information Technology (0.66)
- Transportation (0.46)
- Commercial Services & Supplies > Security & Alarm Services (0.46)
High-dimensional, multiscale online changepoint detection
Chen, Yudong, Wang, Tengyao, Samworth, Richard J.
Modern technology has not only allowed the collection of data sets of unprecedented size, but has also facilitated the real-time monitoring of many types of evolving processes of interest. Wearable health devices, astronomical survey telescopes, self-driving cars and transport network load-tracking systems are just a few examples of new technologies that collect large quantities of streaming data, and that provide new challenges and opportunities for statisticians. Very often, a key feature of interest in the monitoring of a data stream is a changepoint; that is, a moment in time at which the data generating mechanism undergoes a change. Such times often represent events of interest, e.g. a change in heart function, and moreover, the accurate identification of changepoints often facilitates the decomposition of a data stream into stationary segments. Historically, it has tended to be univariate time series that have been monitored and studied, within the well-established field of statistical process control (e.g.
- North America > United States > New York (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
Turn-Taking Based on Information Flow for Fluent Human-Robot Interaction
Thomaz, Andrea L. (Georgia Institute of Technology) | Chao, Crystal (Georgia Institute of Technology)
Turn-taking is a fundamental part of human communication. Our goal is to devise a turn-taking framework for human-robot interaction that, like the human skill, represents something fundamental about interaction, generic to context or domain. We propose a model of turn-taking, and conduct an experiment with human subjects to inform this model. Our findings from this study suggest that information flow is an integral part of human floor-passing behavior. Following this, we implement autonomous floor relinquishing on a robot and discuss our insights into the nature of a general turn-taking model for human-robot interaction.
- North America > United States > New York (0.05)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)