Goto

Collaborating Authors

 perf




Matching the Optimal Denoiser in Point Cloud Diffusion with (Improved) Rotational Alignment

Daigavane, Ameya, Xie, YuQing, Vani, Bodhi P., Saremi, Saeed, Kleinhenz, Joseph, Smidt, Tess

arXiv.org Artificial Intelligence

Diffusion models are a popular class of generative models trained to reverse a noising process starting from a target data distribution. Training a diffusion model consists of learning how to denoise noisy samples at different noise levels. When training diffusion models for point clouds such as molecules and proteins, there is often no canonical orientation that can be assigned. To capture this symmetry, the true data samples are often augmented by transforming them with random rotations sampled uniformly over $SO(3)$. Then, the denoised predictions are often rotationally aligned via the Kabsch-Umeyama algorithm to the ground truth samples before computing the loss. However, the effect of this alignment step has not been well studied. Here, we show that the optimal denoiser can be expressed in terms of a matrix Fisher distribution over $SO(3)$. Alignment corresponds to sampling the mode of this distribution, and turns out to be the zeroth order approximation for small noise levels, explaining its effectiveness. We build on this perspective to derive better approximators to the optimal denoiser in the limit of small noise. Our experiments highlight that alignment is often a `good enough' approximation for the noise levels that matter most for training diffusion models.



Mem-α: Learning Memory Construction via Reinforcement Learning

Wang, Yu, Takanobu, Ryuichi, Liang, Zhiqi, Mao, Yuzhen, Hu, Yuanzhe, McAuley, Julian, Wu, Xiaojian

arXiv.org Artificial Intelligence

Large language model (LLM) agents are constrained by limited context windows, necessitating external memory systems for long-term information understanding. Current memory-augmented agents typically depend on pre-defined instructions and tools for memory updates. However, language models may lack the ability to determine which information to store, how to structure it, and when to update it, especially as memory systems become more complex. This results in suboptimal memory construction and information loss. To this end, we propose Mem-alpha, a reinforcement learning framework that trains agents to effectively manage complex memory systems through interaction and feedback. We also construct a specialized training dataset spanning diverse multi-turn interaction patterns paired with comprehensive evaluation questions designed to teach effective memory management. During training, agents process sequential information chunks, learn to extract and store relevant content, then update the memory system. The reward signal derives from downstream question-answering accuracy over the full interaction history, directly optimizing for memory construction. To illustrate the effectiveness of our training framework, we design a memory architecture comprising core, episodic, and semantic components, equipped with multiple tools for memory operations. Empirical evaluation demonstrates that Mem-alpha achieves significant improvements over existing memory-augmented agent baselines. Despite being trained exclusively on instances with a maximum length of 30k tokens, our agents exhibit remarkable generalization to sequences exceeding 400k tokens, over 13x the training length, highlighting the robustness of Mem-alpha.


Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs

Peng, Shangpin, Wang, Weinong, Tian, Zhuotao, Yang, Senqiao, Wu, Xing, Xu, Haotian, Zhang, Chengquan, Isobe, Takashi, Hu, Baotian, Zhang, Min

arXiv.org Artificial Intelligence

Direct Preference Optimization (DPO) has become a cornerstone of reinforcement learning from human feedback (RLHF) due to its simplicity and efficiency. However, existing DPO-based approaches typically treat all preference pairs uniformly, ignoring critical variations in their inherent quality and learning utility, leading to suboptimal data utilization and performance. To address this challenge, we propose Omni-DPO, a dual-perspective optimization framework that jointly accounts for (1) the inherent quality of each preference pair and (2) the model's evolving performance on those pairs. By adaptively weighting samples according to both data quality and the model's learning dynamics during training, Omni-DPO enables more effective training data utilization and achieves better performance. Experimental results on various models and benchmarks demonstrate the superiority and generalization capabilities of Omni-DPO. On textual understanding tasks, Gemma-2-9b-it finetuned with Omni-DPO beats the leading LLM, Claude 3 Opus, by a significant margin of 6.7 points on the Arena-Hard benchmark. On mathematical reasoning tasks, Omni-DPO consistently outperforms the baseline methods across all benchmarks, providing strong empirical evidence for the effectiveness and robustness of our approach. Code and models will be available at https://github.com/pspdada/Omni-DPO.


BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs Kay Liu

Neural Information Processing Systems

Despite the importance of graph OD and many algorithms being developed for it in recent years, there is no comprehensive benchmark on graph outlier detection, which we believe has hindered the development and understanding of graph OD algorithms.


AdaTune: Adaptive Tensor Program Compilation Made Efficient

Neural Information Processing Systems

In particular, we propose an adaptive evaluation method that statistically early terminates a costly hardware measurement without losing much accuracy. We further devise a surrogate model with uncertainty quantification that allows the optimization to adapt to hardware and model heterogeneity better.


Conformal Safety Shielding for Imperfect-Perception Agents

Scarbro, William, Imrie, Calum, Yaman, Sinem Getir, Fatehi, Kavan, Pasareanu, Corina S., Calinescu, Radu, Mangal, Ravi

arXiv.org Artificial Intelligence

We consider the problem of safe control in discrete autonomous agents that use learned components for imperfect perception (or more generally, state estimation) from high-dimensional observations. We propose a shield construction that provides run-time safety guarantees under perception errors by restricting the actions available to an agent, modeled as a Markov decision process, as a function of the state estimates. Our construction uses conformal prediction for the perception component, which guarantees that for each observation, the predicted set of estimates includes the actual state with a user-specified probability. The shield allows an action only if it is allowed for all the estimates in the predicted set, resulting in local safety. We also articulate and prove a global safety property of existing shield constructions for perfect-perception agents bounding the probability of reaching unsafe states if the agent always chooses actions prescribed by the shield. We illustrate our approach with a case-study of an experimental autonomous system that guides airplanes on taxiways using high-dimensional perception DNNs.


Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom

Sen, Rishika, Roychowdhury, Sujoy, Soman, Sumit, Ranjani, H. G., Mohanty, Srikhetra

arXiv.org Artificial Intelligence

Figure 1 shows the heatmap depicting performance of 16 combinations of KD for 14 metrics. For brevity, we also report the mean of all 14 metrics and group-wise metrics (N-gram metrics, embedding based metrics and Oracle-LLM metrics) in Figure 1. We systematically analyze the results and organize our findings as impact of (i) SFT (RQ1) (ii) SFT on teacher and student (RQ1) (iii) vocabulary and KD algorithm (RQ2) (iv) performance metrics groups (RQ3) 3.1 Impact of SFT We organize analysis with vocabulary as starting point: 3.1.1 Llama Consider the bar plots which depicts Llama as the teacher in Figure 1 i.e., the bars denoting (Llama, V anilla KD) and (Llama, DSKD). We observe that SFT of teacher/student/both results in improvement of performance irrespective of the training algorithm (first bar vs the subsequent 3 bars). The improvement is statistically significant (refer to H S train, H T train, H T,S trainin Table 3). Here, we observe that NH is rejected for most metrics (13 out of 14 for V anilla KD and 8 or 9 out of 14 for DSKD) with SFT of student or teacher or both for Llama vocabulary, irrespective of algorithms.