AITopics | response latency

Collaborating Authors

response latency

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Semantic Chain-of-Trust: Autonomous Trust Orchestration for Collaborator Selection via Hypergraph-Aided Agentic AI

Zhu, Botao, Wang, Xianbin, Niyato, Dusit

arXiv.org Artificial IntelligenceDec-9-2025

The effective completion of tasks in collaborative systems hinges on task-specific trust evaluations of potential devices for distributed collaboration. Due to independent operation of devices involved, dynamic evolution of their mutual relationships, and complex situation-related impact on trust evaluation, effectively assessing devices' trust for collaborator selection is challenging. To overcome this challenge, we propose a semantic chain-of-trust model implemented with agentic AI and hypergraphs for supporting effective collaborator selection. We first introduce a concept of semantic trust, specifically designed to assess collaborators along multiple semantic dimensions for a more accurate representation of their trustworthiness. To facilitate intelligent evaluation, an agentic AI system is deployed on each device, empowering it to autonomously perform necessary operations, including device state detection, trust-related data collection, semantic extraction, task-specific resource evaluation, to derive a semantic trust representation for each collaborator. In addition, each device leverages a hypergraph to dynamically manage potential collaborators according to different levels of semantic trust, enabling fast one-hop collaborator selection. Furthermore, adjacent trusted devices autonomously form a chain through the hypergraph structure, supporting multi-hop collaborator selection. Experimental results demonstrate that the proposed semantic chain-of-trust achieves 100\% accuracy in trust evaluation based on historical collaborations, enabling intelligent, resource-efficient, and precise collaborator selection.

collaborator, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.23565

Country:

North America > Canada (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

PolyG: Adaptive Graph Traversal for Diverse GraphRAG Questions

Liu, Renjie, Jiang, Haitian, Yan, Xiao, Tang, Bo, Li, Jinyang

arXiv.org Artificial IntelligenceNov-4-2025

GraphRAG enhances large language models (LLMs) to generate quality answers for user questions by retrieving related facts from external knowledge graphs. However, current GraphRAG methods are primarily evaluated on and overly tailored for knowledge graph question answering (KGQA) benchmarks, which are biased towards a few specific question patterns and do not reflect the diversity of real-world questions. To better evaluate GraphRAG methods, we propose a complete four-class taxonomy to categorize the basic patterns of knowledge graph questions and use it to create PolyBench, a new GraphRAG benchmark encompassing a comprehensive set of graph questions. With the new benchmark, we find that existing GraphRAG methods fall short in effectiveness (i.e., quality of the generated answers) and/or efficiency (i.e., response time or token usage) because they adopt either a fixed graph traversal strategy or free-form exploration by LLMs for fact retrieval. However, different question patterns require distinct graph traversal strategies and context formation. To facilitate better retrieval, we propose PolyG, an adaptive GraphRAG approach by decomposing and categorizing the questions according to our proposed question taxonomy. Built on top of a unified interface and execution engine, PolyG dynamically prompts an LLM to generate a graph database query to retrieve the context for each decomposed basic question. Compared with SOTA GraphRAG methods, PolyG achieves a higher win rate in generation quality and has a low response latency and token cost. Our code and benchmark are open-source at https://github.com/Liu-rj/PolyG.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

2504.02112

Country:

Europe > Germany > Saxony > Leipzig (0.04)
Europe > France (0.04)
Asia > Indonesia > Bali (0.04)
(9 more...)

Genre:

Research Report (0.82)
Workflow (0.68)
Overview (0.67)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

AdaptJobRec: Enhancing Conversational Career Recommendation through an LLM-Powered Agentic System

Wang, Qixin, Wang, Dawei, Chen, Kun, Hu, Yaowei, Girdhar, Puneet, Wang, Ruoteng, Gupta, Aadesh, Devella, Chaitanya, Guo, Wenlai, Huang, Shangwen, Aoun, Bachir, Hayworth, Greg, Li, Han, Wu, Xintao

arXiv.org Artificial IntelligenceOct-15-2025

In recent years, recommendation systems have evolved from providing a single list of recommendations to offering a comprehensive suite of topic-focused services. To better accomplish this task, conversational recommendation systems (CRS) have progressed from basic retrieval-augmented LLM generation to agentic systems with advanced reasoning and self-correction capabilities. However, agentic systems come with notable response latency--a longstanding challenge for conversational recommendation systems. To balance the trade-off between handling complex queries and minimizing latency, we propose AdaptJobRec, the first conversational job recommendation system that leverages autonomous agent to integrate personalized recommendation algorithm tools. The system employs a user query complexity identification mechanism to minimize response latency. For straightforward queries, the agent directly selects the appropriate tool for rapid responses. For complex queries, the agent uses the memory processing module to filter chat history for relevant content, then passes the results to the intelligent task decomposition planner, and finally executes the tasks using personalized recommendation tools. Evaluation on Walmart's real-world career recommendation scenarios demonstrates that AdaptJobRec reduces average response latency by up to 53.3% compared to competitive baselines, while significantly improving recommendation accuracy.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2508.13423

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.40)
North America > United States > California > Santa Clara County > Sunnyvale (0.05)
North America > United States > Arkansas (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Accelerating LLM Inference with Precomputed Query Storage

Park, Jay H., Cho, Youngju, Lee, Choungsol, Oh, Moonwook, Seo, Euiseong

arXiv.org Artificial IntelligenceOct-1-2025

Large language model (LLM) inference often suffers from high latency, particularly in resource-constrained environments such as on-device or edge deployments. To address this challenge, we present StorInfer, a novel storage-assisted LLM inference system that accelerates response time by precomputing and storing predictable query-response pairs offline. When a user query semantically matches a precomputed query, StorInfer bypasses expensive GPU inference and instantly returns the stored response, significantly reducing latency and compute costs. To maximize coverage and effectiveness, StorInfer employs an LLM-driven generator that adaptively produces diverse and deduplicated queries based on a given knowledge base. This is achieved via two techniques: adaptive query masking, which prevents regeneration of similar queries, and adaptive sampling, which dynamically tunes generation parameters to promote semantic diversity. The resulting query-response pairs are embedded and indexed using a disk-backed vector database to enable fast, similarity-based retrieval at runtime. Using this approach, we generated 150K unique precomputed pairs (taking up to 830 MB of storage space), achieving up to 17.3% latency reduction with no loss in response quality. Our evaluation across multiple QA datasets demonstrates the practicality and scalability of storage-assisted inference, especially in scenarios with predictable query distributions. StorInfer highlights a promising direction in leveraging storage as a primary enabler for efficient, low-latency LLM deployment.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.25919

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Neural Bandit Based Optimal LLM Selection for a Pipeline of Tasks

Atalar, Baran, Zhang, Eddie, Joe-Wong, Carlee

arXiv.org Artificial IntelligenceAug-19-2025

With the increasing popularity of large language models (LLMs) for a variety of tasks, there has been a growing interest in strategies that can predict which out of a set of LLMs will yield a successful answer at low cost. This problem promises to become more and more relevant as providers like Microsoft allow users to easily create custom LLM "assistants" specialized to particular types of queries. However, some tasks (i.e., queries) may be too specialized and difficult for a single LLM to handle alone. These applications often benefit from breaking down the task into smaller subtasks, each of which can then be executed by a LLM expected to perform well on that specific subtask. For example, in extracting a diagnosis from medical records, one can first select an LLM to summarize the record, select another to validate the summary, and then select another, possibly different, LLM to extract the diagnosis from the summarized record. Unlike existing LLM selection or routing algorithms, this setting requires that we select a sequence of LLMs, with the output of each LLM feeding into the next and potentially influencing its success. Thus, unlike single LLM selection, the quality of each subtask's output directly affects the inputs, and hence the cost and success rate, of downstream LLMs, creating complex performance dependencies that must be learned and accounted for during selection. We propose a neural contextual bandit-based algorithm that trains neural networks that model LLM success on each subtask in an online manner, thus learning to guide the LLM selections for the different subtasks, even in the absence of historical LLM performance data. Experiments on telecommunications question answering and medical diagnosis prediction datasets illustrate the effectiveness of our proposed approach compared to other LLM selection algorithms.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.09958

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

LLMs meet Federated Learning for Scalable and Secure IoT Management

Otoum, Yazan, Asad, Arghavan, Nayak, Amiya

arXiv.org Artificial IntelligenceMay-14-2025

The rapid expansion of IoT ecosystems introduces severe challenges in scalability, security, and real-time decision-making. Traditional centralized architectures struggle with latency, privacy concerns, and excessive resource consumption, making them unsuitable for modern large-scale IoT deployments. This paper presents a novel Federated Learning-driven Large Language Model (FL-LLM) framework, designed to enhance IoT system intelligence while ensuring data privacy and computational efficiency. The framework integrates Generative IoT (GIoT) models with a Gradient Sensing Federated Strategy (GSFS), dynamically optimizing model updates based on real-time network conditions. By leveraging a hybrid edge-cloud processing architecture, our approach balances intelligence, scalability, and security in distributed IoT environments. Evaluations on the IoT-23 dataset demonstrate that our framework improves model accuracy, reduces response latency, and enhances energy efficiency, outperforming traditional FL techniques (i.e., FedAvg, FedOpt). These findings highlight the potential of integrating LLM-powered federated learning into large-scale IoT ecosystems, paving the way for more secure, scalable, and adaptive IoT management solutions.

large language model, machine learning, real time system, (21 more...)

arXiv.org Artificial Intelligence

2504.16032

Country: North America > Canada > Ontario > National Capital Region > Ottawa (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

PHEONA: An Evaluation Framework for Large Language Model-based Approaches to Computational Phenotyping

Pungitore, Sarah, Yadav, Shashank, Subbian, Vignesh

arXiv.org Artificial IntelligenceApr-7-2025

Computational phenotyping is essential for biomedical research but often requires significant time and resources, especially since traditional methods typically involve extensive manual data review. While machine learning and natural language processing advancements have helped, further improvements are needed. Few studies have explored using Large Language Models (LLMs) for these tasks despite known advantages of LLMs for text-based tasks. T o facilitate further research in this area, we developed an evaluation framework, Evaluation of PHEnotyping for Observational Health Data (PHEONA), that outlines context-specific considerations. W e applied and demonstrated PHEONA on concept classification, a specific task within a broader phenotyping process for Acute Respiratory Failure (ARF) respiratory support therapies. From the sample concepts tested, we achieved high classification accuracy, suggesting the potential for LLM-based methods to improve computational phenotyping processes.

classification, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.19265

Country: North America > United States > Arizona (0.04)

Genre:

Overview (0.93)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

ALISE: Accelerating Large Language Model Serving with Speculative Scheduling

Zhao, Youpeng, Wang, Jun

arXiv.org Artificial IntelligenceOct-30-2024

Large Language Models (LLMs) represent a revolutionary advancement in the contemporary landscape of artificial general intelligence (AGI). As exemplified by ChatGPT, LLM-based applications necessitate minimal response latency and maximal throughput for inference serving. However, due to the unpredictability of LLM execution, the first-come-first-serve (FCFS) scheduling policy employed by current LLM serving systems suffers from head-of-line (HoL) blocking issues and long job response times. In this paper, we propose a new efficient LLM inference serving framework, named ALISE. The key design paradigm of ALISE is to leverage a novel speculative scheduler by estimating the execution time for each job and exploiting such prior knowledge to assign appropriate job priority orders, thus minimizing potential queuing delays for heterogeneous workloads. Furthermore, to mitigate the memory overhead of the intermediate key-value (KV) cache, we employ a priority-based adaptive memory management protocol and quantization-based compression techniques. Evaluations demonstrate that in comparison to the state-of-the-art solution vLLM, ALISE improves the throughput of inference serving by up to 1.8x and 2.1x under the same latency constraint on the Alpaca and ShareGPT datasets, respectively.

inference, kv cache, latency, (15 more...)

arXiv.org Artificial Intelligence

2410.23537

Country:

North America > United States > Florida > Orange County > Orlando (0.14)
North America > United States > New Jersey (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Predicting the First Response Latency of Maintainers and Contributors in Pull Requests

Khatoonabadi, SayedHassan, Abdellatif, Ahmad, Costa, Diego Elias, Shihab, Emad

arXiv.org Artificial IntelligenceNov-13-2023

The success of a Pull Request (PR) depends on the responsiveness of the maintainers and the contributor during the review process. Being aware of the expected waiting times can lead to better interactions and managed expectations for both the maintainers and the contributor. In this paper, we propose a machine-learning approach to predict the first response latency of the maintainers following the submission of a PR, and the first response latency of the contributor after receiving the first response from the maintainers. We curate a dataset of 20 large and popular open-source projects on GitHub and extract 21 features to characterize projects, contributors, PRs, and review processes. Using these features, we then evaluate seven types of classifiers to identify the best-performing models. We also perform permutation feature importance and SHAP analyses to understand the importance and impact of different features on the predicted response latencies. Our best-performing models achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors compared to a no-skilled classifier across the projects. Our findings indicate that PRs submitted earlier in the week, containing an average or slightly above-average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average or slightly above-average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses.

contributor, maintainer, response latency, (15 more...)

arXiv.org Artificial Intelligence

2311.07786

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

HiFlash: Communication-Efficient Hierarchical Federated Learning with Adaptive Staleness Control and Heterogeneity-aware Client-Edge Association

Wu, Qiong, Chen, Xu, Ouyang, Tao, Zhou, Zhi, Zhang, Xiaoxi, Yang, Shusen, Zhang, Junshan

arXiv.org Artificial IntelligenceJan-16-2023

Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients while keeping the training data locally. However, for many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN), leading to significant communication overhead and long transmission time. To mitigate the communication bottleneck, we resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing and combines synchronous client-edge model aggregation and asynchronous edge-cloud model aggregation together to greatly reduce the traffic volumes of WAN transmissions. Specifically, we first analyze the convergence bound of HiFL theoretically and identify the key controllable factors for model performance improvement. We then advocate an enhanced design of HiFlash by innovatively integrating deep reinforcement learning based adaptive staleness control and heterogeneity-aware client-edge association strategy to boost the system efficiency and mitigate the staleness effect without compromising model accuracy. Extensive experiments corroborate the superior performance of HiFlash in model accuracy, communication reduction, and system efficiency.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2301.06447

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > United States > Virginia (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (1.00)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback