AITopics | Genre

Collaborating Authors

Genre

HypergraphEmbeddingSuperposition PrincipleUpdatedEmbeddingattractiondamping potential contournodes with a hyperedgenoiseuncertaintyrepulsion

Neural Information Processing SystemsJun-23-2026, 12:16:06 GMT

We introduce a novel hypergraph message passing framework inspired by interacting particle systems, where hyperedges act as fields inducing shared node dynamics. By incorporating attraction, repulsion, and Allen-Cahn forcing terms, particles of varying classes and features achieve class-dependent equilibrium, enabling separability through the particle-driven message passing. We investigate both first-order and secondorder particle system equations for modeling these dynamics, which mitigate over-smoothing and heterophily thus can capture complete interactions. The more stable second-order system permits deeper message passing. Furthermore, we enhance deterministic message passing with stochastic element to account for interaction uncertainties. We prove theoretically that our approach mitigates oversmoothing by maintaining a positive lower bound on the hypergraph Dirichlet energy during propagation and thus to enable hypergraph message passing to go deep. Empirically, our models demonstrate competitive performance on diverse real-world hypergraph node classification tasks, excelling on both homophilic and heterophilic datasets. Source code is available at the link.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Set Smoothness Unlocks Clarke Hyper-stationarity in Bilevel Optimization

Neural Information Processing SystemsJun-23-2026, 12:16:02 GMT

Solving bilevel optimization (BLO) problems to global optimality is generally intractable. A common surrogate is to compute a hyper-stationary point--a stationary point of the hyper-objective function obtained by minimizing or maximizing the upper-level objective over the lower-level solution set. Existing methods, however, either provide weak notions of stationarity or require restrictive assumptions to guarantee the smoothness of hyper-objective functions. In this paper, we eliminate these impractical assumptions and show that strong (Clarke) hyper-stationarity remains computable even when the hyper-objective is nonsmooth. Our key ingredient is a new structural property, called set smoothness, which captures the variational dependence of the lower-level solution set on the upper-level variable. We prove that this property holds for a broad class of BLO problems and ensures weak convexity (resp.

artificial intelligence, machine learning, optimization, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Leveraging semantic similarity for experimentation with AI-generated treatments

Neural Information Processing SystemsJun-23-2026, 12:15:55 GMT

Large Language Models (LLMs) enable a new form of digital experimentation where treatments combine human and model-generated content in increasingly sophisticated ways. The main methodological challenge in this setting is representing these high-dimensional treatments without losing their semantic meaning or rendering analysis intractable. Here we address this problem by focusing on learning low-dimensional representations that capture the underlying structure of such treatments. These representations enable downstream applications such as guiding generative models to produce meaningful treatment variants and facilitating adaptive assignment in online experiments. We propose double kernel representation learning, which models the causal effect through the inner product of kernel-based representations of treatments and user covariates. We develop an alternating-minimization algorithm that learns these representations efficiently from data and provide convergence guarantees under a low-rank factor model. As an application of this framework, we introduce an adaptive design strategy for online experimentation and demonstrate the method's effectiveness through numerical experiments.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Industry: Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.64)

Add feedback

Hankel Singular Value Regularization for Highly Compressible State Space Models

Neural Information Processing SystemsJun-23-2026, 12:15:44 GMT

Deep neural networks using state space models as layers are well suited for longrange sequence tasks but can be challenging to compress after training. We use that regularizing the sum of Hankel singular values of state space models leads to a fast decay of these singular values and thus to compressible models. To make the proposed Hankel singular value regularization scalable, we develop an algorithm to efficiently compute the Hankel singular values during training iterations by exploiting the specific block-diagonal structure of the system matrices that we use in our state space model parametrization. Experiments on Long Range Arena benchmarks demonstrate that the regularized state space layers are up to 10 more compressible than standard state space layers while maintaining high accuracy.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos

Neural Information Processing SystemsJun-23-2026, 12:09:13 GMT

Composed Video Retrieval (CoVR) retrieves a target video given a query video and a modification text describing the intended change. Existing CoVR benchmarks emphasize appearance shifts or coarse event changes and therefore do not test the ability to capture subtle, fast-paced temporal differences. We introduce TFCoVR, the first large-scale benchmark dedicated to temporally fine-grained CoVR. TF-CoVR focuses on gymnastics and diving, and provides 180K triplets drawn from FineGym and FineDiving datasets. Previous CoVR benchmarks, focusing on temporal aspect, link each query to a single target segment taken from the same video, limiting practical usefulness. In TF-CoVR, we instead construct each pair by prompting an LLM with the label differences between clips drawn from different videos; every pair is thus associated with multiple valid target videos (3.9 on average), reflecting real-world tasks such as sports-highlight generation. To model these temporal dynamics, we propose TF-CoVR-Base, a concise two-stage training framework: (i) pre-train a video encoder on fine-grained action classification to obtain temporally discriminative embeddings; (ii) align the composed query with candidate videos using contrastive learning. We conduct the first comprehensive study of image, video, and general multimodal embedding (GME) models on temporally fine-grained composed retrieval in both zero-shot and fine-tuning regimes. On TF-CoVR, TF-CoVR-Base improves zero-shot mAP@50 from 5.92 (LanguageBind) to 7.51, and after fine-tuning raises the state-of-the-art from 19.83 to 27.22.

large language model, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country: Europe (0.67)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones

Neural Information Processing SystemsJun-23-2026, 12:08:16 GMT

In this study, we investigate the underlying mechanisms behind the persistence of these errors across LMs of varying sizes (124M-7B) to both understand and mitigate the errors. Our study reveals that LMs rely on a number of components (attention heads and FF neurons) that independently make their own predictions. While some components reliably predict correct answers across a generalized range of inputs (i.e., implementing "sound mechanisms"), others are less reliable and introduce noise by promoting incorrect tokens (i.e., implementing "faulty mechanisms"). Errors occur when the faulty mechanisms overshadow the sound ones and dominantly affect the predictions. Motivated by this insight, we introduce RASTEER, a steering method to systematically identify and increase the contribution of reliable components for improving model performance. RASTEER substantially improves performance on balanced parentheses tasks, boosting accuracy of some models from 0% to around 100%, without impairing the models' general coding ability. We further demonstrate its broader applicability in arithmetic reasoning tasks, achieving performance gains of up to around 20%.1

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Improving Target Sound Extraction via Disentangled Codec Representations with Privileged Knowledge Distillation

Neural Information Processing SystemsJun-23-2026, 12:08:03 GMT

Target sound extraction aims to isolate target sound sources from an input mixture using a target clue to identify the sounds of interest. To address the challenge posed by the wide variety of sounds, recent work has introduced privileged knowledge distillation (PKD), which utilizes privileged information (PI) about the target sound, available only during training. While PKD has shown promise, existing approaches often suffer from overfitting of the teacher model for the overly rich PI and ineffective knowledge transfer to the student model. In this paper, we propose Disentangled Codec Knowledge Distillation (DCKD) to mitigate these issues by regulating the amount and the flow of target sound information within the teacher model. We begin by extracting a compressed representation of the target sound using a neural audio codec to regulate the amount of PI. Disentangled representation learning is then applied to remove class information and extract fine-grained temporal information as PI. Subsequently, an n-hot vector as the class information and the class-independent PI are used to condition the early and later layers of the teacher model, respectively, forming a regulated coarse-to-fine target information flow. The resulting representation is transferred to the student model through feature-level knowledge distillation. Experimental results show that DCKD consistently improves existing methods across model architectures under the multi-target selection condition.

information, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Probing Neural Combinatorial Optimization Models

Neural Information Processing SystemsJun-23-2026, 12:07:30 GMT

Neural combinatorial optimization (NCO) has achieved remarkable performance, yet its learned model representations and decision rationale remain a black box. This impedes both academic research and practical deployment, since researchers and stakeholders require deeper insights into NCO models. In this paper, we take the first critical step towards interpreting NCO models by investigating their representations through various probing tasks. Moreover, we introduce a novel probing tool named Coefficient Significance Probing (CS-Probing) to enable deeper analysis of NCO representations by examining the coefficients and statistical significance during probing. Extensive experiments and analysis reveal that NCO models encode low-level information essential for solution construction, while capturing high-level knowledge to facilitate better decisions. Using CS-Probing, we find that prevalent NCO models impose varying inductive biases on their learned representations, uncover direct evidence related to model generalization, and identify key embedding dimensions associated with specific knowledge. These insights can be potentially translated into practice, for example, with minor code modifications, we improve the generalization of the analyzed model. Our work represents a first systematic attempt to interpret black-box NCO models, showcasing probing as a promising tool for analyzing their internal mechanisms and revealing insights for the NCO community. The source code is publicly available 2.

artificial intelligence, machine learning, node, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.92)

Industry: Transportation (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.85)

Add feedback

Towards Robust Zero-Shot Reinforcement Learning

Neural Information Processing SystemsJun-23-2026, 12:07:19 GMT

The recent development of zero-shot reinforcement learning (RL) has opened a new avenue for learning pre-trained generalist policies that can adapt to arbitrary new tasks in a zero-shot manner. While the popular Forward-Backward representations (FB) and related methods have shown promise in zero-shot RL, we empirically found that their modeling lacks expressivity and that extrapolation errors caused by out-of-distribution (OOD) actions during offline learning sometimes lead to biased representations, ultimately resulting in suboptimal performance. To address these issues, we propose Behavior-REgularizEd Zero-shot RL with Expressivity enhancement (BREEZE), an upgraded FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality. BREEZE introduces behavioral regularization in zero-shot RL policy learning, transforming policy optimization into a stable in-sample learning paradigm.

large language model, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks

Neural Information Processing SystemsJun-23-2026, 12:06:34 GMT

Signal Temporal Logic (STL) is a powerful specification language for describing complex temporal behaviors of continuous signals, making it well-suited for highlevel robotic task descriptions. However, generating executable plans for STL tasks is challenging, as it requires consideration of the coupling between the task specification and the system dynamics. Existing approaches either follow a modelbased setting that explicitly requires knowledge of the system dynamics or adopt a task-oriented data-driven approach to learn plans for specific tasks. In this work, we address the problem of generating executable STL plans for systems with unknown dynamics. We propose a hierarchical planning framework that enables zero-shot generalization to new STL tasks by leveraging only task-agnostic trajectory data during offline training. The framework consists of three key components: (i) decomposing the STL specification into several progresses and time constraints, (ii) searching for timed waypoints that satisfy all progresses under time constraints, and (iii) generating trajectory segments using a pre-trained diffusion model and stitching them into complete trajectories. We formally prove that our method guarantees STL satisfaction, and simulation results demonstrate its effectiveness in generating dynamically feasible trajectories across diverse long-horizon STL tasks.

large language model, machine learning, trajectory, (20 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: