Goto

Collaborating Authors

 perf





Mem-α: Learning Memory Construction via Reinforcement Learning

arXiv.org Artificial Intelligence

Large language model (LLM) agents are constrained by limited context windows, necessitating external memory systems for long-term information understanding. Current memory-augmented agents typically depend on pre-defined instructions and tools for memory updates. However, language models may lack the ability to determine which information to store, how to structure it, and when to update it, especially as memory systems become more complex. This results in suboptimal memory construction and information loss. To this end, we propose Mem-alpha, a reinforcement learning framework that trains agents to effectively manage complex memory systems through interaction and feedback. We also construct a specialized training dataset spanning diverse multi-turn interaction patterns paired with comprehensive evaluation questions designed to teach effective memory management. During training, agents process sequential information chunks, learn to extract and store relevant content, then update the memory system. The reward signal derives from downstream question-answering accuracy over the full interaction history, directly optimizing for memory construction. To illustrate the effectiveness of our training framework, we design a memory architecture comprising core, episodic, and semantic components, equipped with multiple tools for memory operations. Empirical evaluation demonstrates that Mem-alpha achieves significant improvements over existing memory-augmented agent baselines. Despite being trained exclusively on instances with a maximum length of 30k tokens, our agents exhibit remarkable generalization to sequences exceeding 400k tokens, over 13x the training length, highlighting the robustness of Mem-alpha.


Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs

arXiv.org Artificial Intelligence

Direct Preference Optimization (DPO) has become a cornerstone of reinforcement learning from human feedback (RLHF) due to its simplicity and efficiency. However, existing DPO-based approaches typically treat all preference pairs uniformly, ignoring critical variations in their inherent quality and learning utility, leading to suboptimal data utilization and performance. To address this challenge, we propose Omni-DPO, a dual-perspective optimization framework that jointly accounts for (1) the inherent quality of each preference pair and (2) the model's evolving performance on those pairs. By adaptively weighting samples according to both data quality and the model's learning dynamics during training, Omni-DPO enables more effective training data utilization and achieves better performance. Experimental results on various models and benchmarks demonstrate the superiority and generalization capabilities of Omni-DPO. On textual understanding tasks, Gemma-2-9b-it finetuned with Omni-DPO beats the leading LLM, Claude 3 Opus, by a significant margin of 6.7 points on the Arena-Hard benchmark. On mathematical reasoning tasks, Omni-DPO consistently outperforms the baseline methods across all benchmarks, providing strong empirical evidence for the effectiveness and robustness of our approach. Code and models will be available at https://github.com/pspdada/Omni-DPO.


BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs Kay Liu

Neural Information Processing Systems

Despite the importance of graph OD and many algorithms being developed for it in recent years, there is no comprehensive benchmark on graph outlier detection, which we believe has hindered the development and understanding of graph OD algorithms.


AdaTune: Adaptive Tensor Program Compilation Made Efficient

Neural Information Processing Systems

In particular, we propose an adaptive evaluation method that statistically early terminates a costly hardware measurement without losing much accuracy. We further devise a surrogate model with uncertainty quantification that allows the optimization to adapt to hardware and model heterogeneity better.


Conformal Safety Shielding for Imperfect-Perception Agents

arXiv.org Artificial Intelligence

We consider the problem of safe control in discrete autonomous agents that use learned components for imperfect perception (or more generally, state estimation) from high-dimensional observations. We propose a shield construction that provides run-time safety guarantees under perception errors by restricting the actions available to an agent, modeled as a Markov decision process, as a function of the state estimates. Our construction uses conformal prediction for the perception component, which guarantees that for each observation, the predicted set of estimates includes the actual state with a user-specified probability. The shield allows an action only if it is allowed for all the estimates in the predicted set, resulting in local safety. We also articulate and prove a global safety property of existing shield constructions for perfect-perception agents bounding the probability of reaching unsafe states if the agent always chooses actions prescribed by the shield. We illustrate our approach with a case-study of an experimental autonomous system that guides airplanes on taxiways using high-dimensional perception DNNs.


Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom

arXiv.org Artificial Intelligence

Figure 1 shows the heatmap depicting performance of 16 combinations of KD for 14 metrics. For brevity, we also report the mean of all 14 metrics and group-wise metrics (N-gram metrics, embedding based metrics and Oracle-LLM metrics) in Figure 1. We systematically analyze the results and organize our findings as impact of (i) SFT (RQ1) (ii) SFT on teacher and student (RQ1) (iii) vocabulary and KD algorithm (RQ2) (iv) performance metrics groups (RQ3) 3.1 Impact of SFT We organize analysis with vocabulary as starting point: 3.1.1 Llama Consider the bar plots which depicts Llama as the teacher in Figure 1 i.e., the bars denoting (Llama, V anilla KD) and (Llama, DSKD). We observe that SFT of teacher/student/both results in improvement of performance irrespective of the training algorithm (first bar vs the subsequent 3 bars). The improvement is statistically significant (refer to H S train, H T train, H T,S trainin Table 3). Here, we observe that NH is rejected for most metrics (13 out of 14 for V anilla KD and 8 or 9 out of 14 for DSKD) with SFT of student or teacher or both for Llama vocabulary, irrespective of algorithms.


Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems

arXiv.org Artificial Intelligence

RAG (Retrieval-Augmented Generation) have recently gained significant attention for their enhanced ability to integrate external knowledge sources in open-domain question answering (QA) tasks. However, it remains unclear how these models address fairness concerns, particularly with respect to sensitive attributes such as gender, geographic location, and other demographic factors. First, as language models evolve to prioritize utility, like improving exact match accuracy, fairness may have been largely overlooked. Second, RAG methods are complex pipelines, making it hard to identify and address biases, as each component is optimized for different goals. In this paper, we aim to empirically evaluate fairness in several RAG methods. We propose a fairness evaluation framework tailored to RAG methods, using scenario-based questions and analyzing disparities across demographic attributes. The experimental results indicate that, despite recent advances in utility-driven optimization, fairness issues persist in both the retrieval and generation stages, highlighting the need for more targeted fairness interventions within RAG pipelines. We will release our dataset and code upon acceptance of the paper.