gcr
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.47)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Michigan (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Towards Outcome-Oriented, Task-Agnostic Evaluation of AI Agents
AlShikh, Waseem, Ali, Muayad Sayed, Kennedy, Brian, Mozolevskyi, Dmytro
As AI agents proliferate across industries and applications, evaluating their performance based solely on infrastructural metrics such as latency, time-to-first-token, or token throughput is proving insufficient. These metrics fail to capture the quality of an agent's decisions, its operational autonomy, or its ultimate business value. This white paper proposes a novel, comprehensive framework of eleven outcome-based, task-agnostic performance metrics for AI agents that transcend domain boundaries. These metrics are designed to enable organizations to evaluate agents based on the quality of their decisions, their degree of autonomy, their adaptability to new challenges, and the tangible business value they deliver, regardless of the underlying model architecture or specific use case. We introduce metrics such as Goal Completion Rate (GCR), Autonomy Index (AIx), Multi-Step Task Resilience (MTR), and Business Impact Efficiency (BIE). Through a large-scale simulated experiment involving four distinct agent architectures (ReAct, Chain-of-Thought, Tool-Augmented, Hybrid) across five diverse domains (Healthcare, Finance, Marketing, Legal, and Customer Service), we demonstrate the framework's efficacy. Our results reveal significant performance trade-offs between different agent designs, highlighting the Hybrid Agent as the most consistently high-performing model across the majority of our proposed metrics, achieving an average Goal Completion Rate of 88.8\% and the highest Return on Investment (ROI). This work provides a robust, standardized methodology for the holistic evaluation of AI agents, paving the way for more effective development, deployment, and governance.
- Health & Medicine (0.50)
- Banking & Finance (0.47)
- Law (0.36)
Graph Your Own Prompt
Ding, Xi, Wang, Lei, Koniusz, Piotr, Gao, Yongsheng
We propose Graph Consistency Regularization (GCR), a novel framework that injects relational graph structures, derived from model predictions, into the learning process to promote class-aware, semantically meaningful feature representations. Functioning as a form of self-prompting, GCR enables the model to refine its internal structure using its own outputs. While deep networks learn rich representations, these often capture noisy inter-class similarities that contradict the model's predicted semantics. GCR addresses this issue by introducing parameter-free Graph Consistency Layers (GCLs) at arbitrary depths. Each GCL builds a batch-level feature similarity graph and aligns it with a global, class-aware masked prediction graph, derived by modulating softmax prediction similarities with intra-class indicators. This alignment enforces that feature-level relationships reflect class-consistent prediction behavior, acting as a semantic regularizer throughout the network. Unlike prior work, GCR introduces a multi-layer, cross-space graph alignment mechanism with adaptive weighting, where layer importance is learned from graph discrepancy magnitudes. This allows the model to prioritize semantically reliable layers and suppress noisy ones, enhancing feature quality without modifying the architecture or training procedure. GCR is model-agnostic, lightweight, and improves semantic structure across various networks and datasets. Experiments show that GCR promotes cleaner feature structure, stronger intra-class cohesion, and improved generalization, offering a new perspective on learning from prediction structure. [Project website](https://darcyddx.github.io/gcr/) [Code](https://github.com/Darcyddx/graph-prompt)
- North America > United States (0.14)
- Oceania > Australia > New South Wales (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology (0.92)
- Transportation > Ground > Road (0.67)
- Health & Medicine > Diagnostic Medicine (0.45)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- (4 more...)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.47)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Health & Medicine > Nuclear Medicine (0.47)
- Health & Medicine > Diagnostic Medicine > Imaging (0.47)
- Health & Medicine > Therapeutic Area (0.47)
Learning with Expert Abstractions for Efficient Multi-Task Continuous Control
Jewett, Jeff, Saisubramanian, Sandhya
Decision-making in complex, continuous multi-task environments is often hindered by the difficulty of obtaining accurate models for planning and the inefficiency of learning purely from trial and error. While precise environment dynamics may be hard to specify, human experts can often provide high-fidelity abstractions that capture the essential high-level structure of a task and user preferences in the target environment. Existing hierarchical approaches often target discrete settings and do not generalize across tasks. We propose a hierarchical reinforcement learning approach that addresses these limitations by dynamically planning over the expert-specified abstraction to generate subgoals to learn a goal-conditioned policy. To overcome the challenges of learning under sparse rewards, we shape the reward based on the optimal state value in the abstract model. This structured decision-making process enhances sample efficiency and facilitates zero-shot generalization. Our empirical evaluation on a suite of procedurally generated continuous control environments demonstrates that our approach outperforms existing hierarchical reinforcement learning methods in terms of sample efficiency, task completion rate, scalability to complex tasks, and generalization to novel scenarios.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Oregon > Benton County > Corvallis (0.04)
- Europe > Bulgaria > Varna Province > Varna (0.04)
- Asia > China (0.04)
On-Robot Reinforcement Learning with Goal-Contrastive Rewards
Biza, Ondrej, Weng, Thomas, Sun, Lingfeng, Schmeckpeper, Karl, Kelestemur, Tarik, Ma, Yecheng Jason, Platt, Robert, van de Meent, Jan-Willem, Wong, Lawson L. S.
Reinforcement Learning (RL) has the potential to enable robots to learn from their own actions in the real world. Unfortunately, RL can be prohibitively expensive, in terms of on-robot runtime, due to inefficient exploration when learning from a sparse reward signal. Designing dense reward functions is labour-intensive and requires domain expertise. In our work, we propose GCR (Goal-Contrastive Rewards), a dense reward function learning method that can be trained on passive video demonstrations. By using videos without actions, our method is easier to scale, as we can use arbitrary videos. GCR combines two loss functions, an implicit value loss function that models how the reward increases when traversing a successful trajectory, and a goal-contrastive loss that discriminates between successful and failed trajectories. We perform experiments in simulated manipulation environments across RoboMimic and MimicGen tasks, as well as in the real world using a Franka arm and a Spot quadruped. We find that GCR leads to a more-sample efficient RL, enabling model-free RL to solve about twice as many tasks as our baseline reward learning methods. We also demonstrate positive cross-embodiment transfer from videos of people and of other robots performing a task. Appendix: \url{https://tinyurl.com/gcr-appendix-2}.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- (31 more...)
Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models
Luo, Linhao, Zhao, Zicheng, Gong, Chen, Haffari, Gholamreza, Pan, Shirui
Large language models (LLMs) have demonstrated impressive reasoning abilities, but they still struggle with faithful reasoning due to knowledge gaps and hallucinations. To address these issues, knowledge graphs (KGs) have been utilized to enhance LLM reasoning through their structured knowledge. However, existing KG-enhanced methods, either retrieval-based or agent-based, encounter difficulties in accurately retrieving knowledge and efficiently traversing KGs at scale. In this work, we introduce graph-constrained reasoning (GCR), a novel framework that bridges structured knowledge in KGs with unstructured reasoning in LLMs. To eliminate hallucinations, GCR ensures faithful KG-grounded reasoning by integrating KG structure into the LLM decoding process through KG-Trie, a trie-based index that encodes KG reasoning paths. KG-Trie constrains the decoding process, allowing LLMs to directly reason on graphs and generate faithful reasoning paths grounded in KGs. Extensive experiments on several KGQA benchmarks demonstrate that GCR achieves state-of-the-art performance and exhibits strong zero-shot generalizability to unseen KGs without additional training. Code is available at https://github.com/RManLuo/ Large language models (LLMs) have shown impressive reasoning abilities in handling complex tasks (Qiao et al., 2023; Huang & Chang, 2023), marking a significant leap that bridges the gap between human and machine intelligence. These issues result in factual errors and flawed reasoning processes (Nguyen et al., 2024), which greatly undermine the reliability of LLMs in real-world applications. To address these issues, many studies utilize knowledge graphs (KGs), which encapsulate extensive factual information in a structured format, to improve the reasoning abilities of LLMs (Pan et al., 2024; Luo et al., 2024). Nevertheless, because of the unstructured nature of LLMs, directly applying them to reason on KGs is challenging. Existing KG-enhanced LLM reasoning methods can be roughly categorized into two groups: retrieval-based and agent-based paradigms, as shown in Figure 2 (a) and (b).
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > Alabama > Mobile County > Mobile (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
Cooperative Tri-Point Model-Based Ground-to-Air Coverage Extension in Beyond 5G Networks
Cai, Ziwei, Sheng, Min, Liu, Junju, Zhao, Chenxi, Li, Jiandong
The utilization of existing terrestrial infrastructures to provide coverage for aerial users is a potentially low-cost solution. However, the already deployed terrestrial base stations (TBSs) result in weak ground-to-air (G2A) coverage due to the down-tilted antennas. Furthermore, achieving optimal coverage across the entire airspace through antenna adjustment is challenging due to the complex signal coverage requirements in three-dimensional space, especially in the vertical direction. In this paper, we propose a cooperative tri-point (CoTP) model-based method that utilizes cooperative beams to enhance the G2A coverage extension. To utilize existing TBSs for establishing effective cooperation, we prove that the cooperation among three TBSs can ensure G2A coverage with a minimum coverage overlap, and design the CoTP model to analyze the G2A coverage extension. Using the model, a cooperative coverage structure based on Delaunay triangulation is designed to divide triangular prism-shaped subspaces and corresponding TBS cooperation sets. To enable TBSs in the cooperation set to cover different height subspaces while maintaining ground coverage, we design a cooperative beam generation algorithm to maximize the coverage in the triangular prism-shaped airspace. The simulation results and field trials demonstrate that the proposed method can efficiently enhance the G2A coverage extension while guaranteeing ground coverage.
- North America > United States > New York (0.04)
- Asia > Vietnam > Long An Province (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)