Goto

Collaborating Authors

 csr





Few-shot Relational Reasoning via Connection Subgraph Pretraining

Neural Information Processing Systems

Few-shot knowledge graph (KG) completion task aims to perform inductive reasoning over the KG: given only a few support triplets of a new relation $\bowtie$ (e.g., (chop,$\bowtie$,kitchen), (read,$\bowtie$,library), the goal is to predict the query triplets of the same unseen relation $\bowtie$, e.g., (sleep,$\bowtie$,?). Current approaches cast the problem in a meta-learning framework, where the model needs to be first jointly trained over many training few-shot tasks, each being defined by its own relation, so that learning/prediction on the target few-shot task can be effective. However, in real-world KGs, curating many training tasks is a challenging ad hoc process. Here we propose Connection Subgraph Reasoner (CSR), which can make predictions for the target few-shot task directly without the need for pre-training on the human curated set of training tasks. The key to CSR is that we explicitly model a shared connection subgraph between support and query triplets, as inspired by the principle of eliminative induction. To adapt to specific KG, we design a corresponding self-supervised pretraining scheme with the objective of reconstructing automatically sampled connection subgraphs. Our pretrained model can then be directly applied to target few-shot tasks on without the need for training few-shot tasks. Extensive experiments on real KGs, including NELL, FB15K-237, and ConceptNet, demonstrate the effectiveness of our framework: we show that even a learning-free implementation of CSR can already perform competitively to existing methods on target few-shot tasks; with pretraining, CSR can achieve significant gains of up to 52% on the more challenging inductive few-shot tasks where the entities are also unseen during (pre)training.


Emergent Coordination and Phase Structure in Independent Multi-Agent Reinforcement Learning

Yamaguchi, Azusa

arXiv.org Artificial Intelligence

A clearer understanding of when coordination emerges, fluctuates, or collapses in decentralized multi-agent reinforcement learning (MARL) is increasingly sought in order to characterize the dynamics of multi-agent learning systems. We revisit fully independent Q-learning (IQL) as a minimal decentralized testbed and run large-scale experiments across environment size L and agent density rho. We construct a phase map using two axes - the cooperative success rate (CSR) and a stability index derived from TD-error variance - revealing three distinct regimes: a coordinated and stable phase, a fragile transition region, and a jammed or disordered phase. A sharp double Instability Ridge separates these regimes and corresponds to persistent kernel drift, the time-varying shift of each agent's effective transition kernel induced by others' policy updates. Synchronization analysis further shows that temporal alignment is required for sustained cooperation, and that competition between drift and synchronization generates the fragile regime. Removing agent identifiers eliminates drift entirely and collapses the three-phase structure, demonstrating that small inter-agent asymmetries are a necessary driver of drift. Overall, the results show that decentralized MARL exhibits a coherent phase structure governed by the interaction between scale, density, and kernel drift, suggesting that emergent coordination behaves as a distribution-interaction-driven phase phenomenon.


ToolScope: Enhancing LLM Agent Tool Use through Tool Merging and Context-Aware Filtering

Liu, Marianne Menglin, Garcia, Daniel, Parllaku, Fjona, Upadhyay, Vikas, Shah, Syed Fahad Allam, Roth, Dan

arXiv.org Artificial Intelligence

Large language model (LLM) agents rely on external tools to solve complex tasks, but real-world toolsets often contain redundant tools with overlapping names and descriptions, introducing ambiguity and reducing selection accuracy. LLMs also face strict input context limits, preventing efficient consideration of large toolsets. To address these challenges, we propose ToolScope, which includes: (1) ToolScopeMerger with Auto-Correction to automatically audit and fix tool merges, reducing redundancy, and (2) ToolScopeRetriever to rank and select only the most relevant tools for each query, compressing toolsets to fit within context limits without sacrificing accuracy. Evaluations on three state-of-the-art LLMs and three open-source tool-use benchmarks show gains of 8.38% to 38.6% in tool selection accuracy, demonstrating ToolScope's effectiveness in enhancing LLM tool use.



PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases

Vuddanti, Sri Vatsa, Shah, Aarav, Chittiprolu, Satwik Kumar, Song, Tony, Dev, Sunishchal, Zhu, Kevin, Chaudhary, Maheep

arXiv.org Artificial Intelligence

Tool-augmented language agents frequently fail in real-world deployment due to tool malfunctions--timeouts, API exceptions, or inconsistent outputs--triggering cascading reasoning errors and task abandonment. Existing agent training pipelines optimize only for success trajectories, failing to expose models to the tool failures that dominate real-world usage. We propose \textbf{PALADIN}, a generalizable framework for equipping language agents with robust failure recovery capabilities. PALADIN trains on 50,000+ recovery-annotated trajectories constructed via systematic failure injection and expert demonstrations on an enhanced ToolBench dataset. Training uses LoRA-based fine-tuning to retain base capabilities while injecting recovery competence. At inference, PALADIN detects execution-time errors and retrieves the most similar case from a curated bank of 55+ failure exemplars aligned with ToolScan's taxonomy, then executes the corresponding recovery action. This approach generalizes to novel failures beyond the training distribution, retaining 95.2\% recovery performance on unseen tool APIs. Evaluation across PaladinEval and ToolReflectEval demonstrates consistent improvements in Recovery Rate (RR), Task Success Rate (TSR), Catastrophic Success Rate (CSR), and Efficiency Score (ES). PALADIN improves RR from 32.76% to 89.68% (+57% relative) over ToolBench and outperforms the strongest baseline CRITIC (76.34%) by +13.3%. Against vanilla agents, PALADIN achieves 89.86\% RR (+66% relative improvement from 23.75%). These results establish PALADIN as an effective method for building fault-tolerant agents capable of robust recovery in real-world tool environments.


Customer Service Representative's Perception of the AI Assistant in an Organization's Call Center

Qin, Kai, Du, Kexin, Chen, Yimeng, Liu, Yueyan, Cai, Jie, Nie, Zhiqiang, Gao, Nan, Wei, Guohui, Wang, Shengzhu, Yu, Chun

arXiv.org Artificial Intelligence

The integration of various AI tools creates a complex socio-technical environment where employee-customer interactions form the core of work practices. This study investigates how customer service representatives (CSRs) at the power grid service customer service call center perceive AI assistance in their interactions with customers. Through a field visit and semi-structured interviews with 13 CSRs, we found that AI can alleviate some traditional burdens during the call (e.g., typing and memorizing) but also introduces new burdens (e.g., earning, compliance, psychological burdens). This research contributes to a more nuanced understanding of AI integration in organizational settings and highlights the efforts and burdens undertaken by CSRs to adapt to the updated system.


Efficient Privacy-Preserving Recommendation on Sparse Data using Fully Homomorphic Encryption

Chowdhury, Moontaha Nishat, Bauer, André, Zhou, Minxuan

arXiv.org Artificial Intelligence

--In today's data-driven world, recommendation systems personalize user experiences across industries but rely on sensitive data, raising privacy concerns. Fully homomorphic encryption (FHE) can secure these systems, but a significant challenge in applying FHE to recommendation systems is efficiently handling the inherently large and sparse user-item rating matrices. FHE operations are computationally intensive, and naively processing various sparse matrices in recommendation systems would be prohibitively expensive. Additionally, the communication overhead between parties remains a critical concern in encrypted domains. We propose a novel approach combining Compressed Sparse Row (CSR) representation with FHE-based matrix factorization that efficiently handles matrix sparsity in the encrypted domain while minimizing communication costs. Our experimental results demonstrate high recommendation accuracy with encrypted data while achieving the lowest communication costs, effectively preserving user privacy. Recommendation systems are widely deployed to help various customers discover preferred products and content, such as online shopping or browsing streaming services. Collaborative filtering (CF) is a commonly used algorithm for recommendation systems [1].