Goto

Collaborating Authors

 Genre


SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Neural Information Processing Systems

LLM-based agents have shown promising capabilities in a growing range of software engineering (SWE) tasks. However, advancing this field faces two critical challenges. First, high-quality training data is scarce, especially data that reflects real-world SWE scenarios, where agents must interact with development environments, execute code and adapt behavior based on the outcomes of their actions. Existing datasets are either limited to one-shot code generation or comprise small, manually curated collections of interactive tasks, lacking both scale and diversity. Second, the lack of fresh interactive SWE tasks affects evaluation of rapidly improving models, as static benchmarks quickly become outdated due to contamination issues. To address these limitations, we introduce a novel, automated, and scalable pipeline to continuously extract real-world interactive SWE tasks from diverse GitHub repositories. Using this pipeline, we construct SWE-rebench, a public dataset comprising over 21,000 interactive Python-based SWE tasks, suitable for reinforcement learning of SWE agents at scale. Additionally, we use continuous supply of fresh tasks collected using SWE-rebench methodology to build a contamination-free benchmark for agentic software engineering. We compare results of various LLMs on this benchmark to results on SWE-bench Verified and show that performance of some language models might be inflated due to contamination issues.


Native Segmentation Vision Transformers

Neural Information Processing Systems

Uniform downsampling remains the de facto standard for reducing spatial resolution in vision backbones. In this work, we propose an alternative design built around a content-aware spatial grouping layer, that dynamically assigns tokens to a reduced set based on image boundaries and their semantic content. Stacking our grouping layer across consecutive backbone stages results in hierarchical segmentation that arises natively in the feature extraction process, resulting in our coined Native Segmentation Vision Transformer. We show that a careful design of our architecture enables the emergence of strong segmentation masks solely from grouping layers, that is, without additional segmentation-specific heads. This sets the foundation for a new paradigm of native, backbone-level segmentation, which enables strong zero-shot results without mask supervision, as well as a minimal and efficient standalone model design for downstream segmentation tasks.


LaRes: Evolutionary Reinforcement Learning with LLM-based Adaptive Reward Search

Neural Information Processing Systems

The integration of evolutionary algorithms (EAs) with reinforcement learning (RL) has shown superior performance compared to standalone methods. However, previous research focuses on exploration in policy parameter space, while overlooking the reward function search. To bridge this gap, we propose LaRes, a novel hybrid framework that achieves efficient policy learning through reward function search. LaRes leverages large language models (LLMs) to generate the reward function population, guiding RL in policy learning. The reward functions are evaluated by the policy performance and improved through LLMs.


AAAI presidential panel โ€“ AI agents

AIHub

The Future of AI Research report, published in March 2025, aims to clearly identify the trajectory of AI research in a structured way. The report was led by outgoing AAAI President Francesca Rossi and covers 17 different AI topics . Members of the report team, and other selected AI practitioners, are taking part in a series of video panel discussions covering selected chapters from the report. In the fifth discussion in the collection, the three panellists tackle the topic of AI agents. How multi-agent systems evolved from rule-based systems to complex cooperative frameworks built on generative AI, and what is really different in the modern notion of an agentic AI system.


Mitigating Semantic Collapse in Partially Relevant Video Retrieval

Neural Information Processing Systems

Partially Relevant Video Retrieval (PRVR) seeks videos where only part of the content matches a text query. Existing methods treat every annotated text-video pair as a positive and all others as negatives, ignoring the rich semantic variation both within a single video and across different videos. Consequently, embeddings of both queries and their corresponding video-clip segments for distinct events within the same video collapse together, while embeddings of semantically similar queries and segments from different videos are driven apart. This limits retrieval performance when videos contain multiple, diverse events. This paper addresses the aforementioned problems, termed as semantic collapse, in both the text and video embedding spaces. We first introduce Text Correlation Preservation Learning, which preserves the semantic relationships encoded by the foundation model across text queries. To address collapse in video embeddings, we propose Cross-Branch Video Alignment (CBVA), a contrastive alignment method that disentangles hierarchical video representations across temporal scales. Subsequently, we introduce order-preserving token merging and adaptive CBVA to enhance alignment by producing video segments that are internally coherent yet mutually distinctive. Extensive experiments on PRVR benchmarks demonstrate that our framework effectively prevents semantic collapse and substantially improves retrieval accuracy.


Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression

Neural Information Processing Systems

With the rise of the fine-tuned-pretrained paradigm, storing numerous fine-tuned models for multi-tasking creates significant storage overhead. Delta compression alleviates this by storing only the pretrained model and the highly compressed delta weights (the differences between fine-tuned and pretrained model weights). However, existing methods fail to maintain both high compression and performance, and often rely on data. To address these challenges, we propose UltraDelta, the first data-free delta compression pipeline that achieves both ultra-high compression and strong performance. UltraDelta is designed to minimize redundancy, maximize information, and stabilize performance across inter-layer, intra-layer, and global dimensions, using three key components: (1) Variance-Based Mixed Sparsity Allocation assigns sparsity based on variance, giving lower sparsity to high-variance layers to preserve inter-layer information.


FSEO: Few-Shot Evolutionary Optimization via Meta Learning for Expensive Multi-Objective Optimization

Neural Information Processing Systems

Meta-learning has been demonstrated to be useful to improve the sampling efficiency of Bayesian optimization (BO) and surrogate-assisted evolutionary algorithms (SAEAs) when solving expensive optimization problems (EOPs). Existing studies mainly focus on either combinations of existing meta-learning modeling methods with optimization algorithms, or the development of meta-learning acquisition functions for specific meta BO. However, the meta-learning models used in the literature are not designed for optimization purposes, and the generalization ability of meta-learning acquisition functions is limited. In this work, we develop a novel architecture of meta-learning model for optimization purposes and propose a generalized few-shot evolutionary optimization (FSEO) framework to solve EOPs. We focus on the scenario of expensive multi-objective EOPs (EMOPs) in the context of few-shot optimization as there are few studies on it and its high requirement on surrogate modeling performance. The surrogates in FSEO framework combines neural network with Gaussian Processes (GPs), their network parameters and some parameters of GPs represent task-independent experience and are meta-learned across related optimization tasks, the remaining GPs parameters are task-specific parameters that represent unique features of the target task. We demonstrate that FSEO is able to improve the sampling efficiency of existing SAEAs on EMOPs.


Minimizing False-Positive Attributions in Explanations of Non-Linear Models

Neural Information Processing Systems

Suppressor variables can influence model predictions without being dependent on the target outcome, and they pose a significant challenge for Explainable AI (XAI) methods. These variables may cause false-positive feature attributions, undermining the utility of explanations. Although effective remedies exist for linear models, their extension to non-linear models and instance-based explanations has remained limited. We introduce PatternLocal, a novel XAI technique that addresses this gap. PatternLocal begins with a locally linear surrogate, e.g., LIME, KernelSHAP, or gradient-based methods, and transforms the resulting discriminative model weights into a generative representation, thereby suppressing the influence of suppressor variables while preserving local fidelity. In extensive hyperparameter optimization on the XAI-TRIS benchmark, PatternLocal consistently outperformed other XAI methods and reduced false-positive attributions when explaining non-linear tasks, thereby enabling more reliable and actionable insights. We further evaluate PatternLocal on an EEG motor imagery dataset, demonstrating physiologically plausible explanations.


ASingle-Loop Gradient Algorithm for Pessimistic Bilevel Optimization via Smooth Approximation

Neural Information Processing Systems

Bilevel optimization has garnered significant attention in the machine learning community recently, particularly regarding the development of efficient numerical methods. While substantial progress has been made in developing efficient algorithms for optimistic bilevel optimization, the study of methods for solving Pessimistic Bilevel Optimization (PBO) remains relatively less explored, especially the design of fully first-order, single-loop gradient-based algorithms. This paper aims to bridge this research gap. We first propose a novel smooth approximation to the PBO problem, using penalization and regularization techniques. Building upon this approximation, we then propose SiPBA (Single-loop Pessimistic Bilevel Algorithm), a new gradient-based method specifically designed for PBO which avoids second-order derivative information or inner-loop iterations for subproblem solving. We provide theoretical validation for the proposed smooth approximation scheme and establish theoretical convergence for the algorithm SiPBA. Numerical experiments on synthetic examples and practical applications demonstrate the effectiveness and efficiency of SiPBA.


Minimal Semantic Sufficiency Meets Unsupervised Domain Generalization

Neural Information Processing Systems

The generalization ability of deep learning has been extensively studied in supervised settings, yet it remains less explored in unsupervised scenarios. Recently, the Unsupervised Domain Generalization (UDG) task has been proposed to enhance the generalization of models trained with prevalent unsupervised learning techniques, such as Self-Supervised Learning (SSL). UDG confronts the challenge of distinguishing semantics from variations without category labels. Although some recent methods have employed domain labels to tackle this issue, such domain labels are often unavailable in real-world contexts. In this paper, we address these limitations by formalizing UDG as the task of learning a Minimal Sufficient Semantic Representation: a representation that (i) preserves all semantic information shared across augmented views (sufficiency), and (ii) maximally removes information irrelevant to semantics (minimality). We theoretically ground these objectives from the perspective of information theory, demonstrating that optimizing representations to achieve sufficiency and minimality directly reduces out-of-distribution risk. Practically, we implement this optimization through Minimal-Sufficient UDG (MSUDG), a learnable model by integrating (a) an InfoNCE-based objective to achieve sufficiency; (b) two complementary components to promote minimality: a novel semantic-variation disentanglement loss and a reconstruction-based mechanism for capturing adequate variation. Empirically, MS-UDG sets a new state-of-the-art on popular unsupervised domain-generalization benchmarks, consistently outperforming existing SSL and UDG methods, without category or domain labels during representation learning.