AITopics | Europe

Collaborating Authors

Europe

Don't Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation

Neural Information Processing SystemsJun-22-2026, 22:43:13 GMT

While diffusion language models (DLMs) enable fine-grained refinement, their practical controllability remains fragile. We identify and formally characterize a central failure mode--update-forgetting--in which uniform, context-agnostic updates induce token-level fluctuations across timesteps, erasing earlier semantic edits and disrupting the cumulative refinement process, thereby degrading fluency and coherence. As this failure originates in uniform, context-agnostic updates, effective control demands explicit token ordering. We propose Token Timestep Allocation (TTA-DIFFUSION), which realizes soft, semantic token ordering via pertoken timestep schedules: critical tokens are frozen early, while uncertain tokens receive continued refinement. This timestep-based ordering can be instantiated as either a fixed policy or an adaptive policy driven by task signals, thereby supporting a broad spectrum of refinement strategies. Because it operates purely at inference time, it applies uniformly across various DLMs and naturally extends to diverse supervision sources. Empirically, TTA-DIFFUSION improves controllability and fluency: on sentiment control, it yields >20%higher accuracy and nearly halves perplexity using <1/5 the steps; in detoxification, it lowers maximum toxicity (12.2 vs. 14.5) and perplexity (26.0 vs. 32.0). Together, these results demonstrate that softened ordering via timestep allocation is the critical lever for mitigating update-forgetting and achieving stable and controllable diffusion text generation.

artificial intelligence, natural language, text classification, (17 more...)

Neural Information Processing Systems

Country:

Asia (0.46)
Europe (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (0.46)
Education (0.46)
Banking & Finance (0.46)
Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.45)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.45)

Add feedback

Learning Chern Numbers of Multiband Topological Insulators with Gauge Equivariant Neural Networks

Neural Information Processing SystemsJun-22-2026, 22:28:44 GMT

Equivariant network architectures are a well-established tool for predicting invariant or equivariant quantities. However, almost all learning problems considered in this context feature a global symmetry, i.e. each point of the underlying space is transformed with the same group element, as opposed to a local gauge symmetry, where each point is transformed with a different group element, exponentially enlarging the size of the symmetry group. We use gauge equivariant networks to predict topological invariants (Chern numbers) of multiband topological insulators for the first time. The gauge symmetry of the network guarantees that the predicted quantity is a topological invariant. A major technical challenge is that the relevant gauge equivariant networks are plagued by instabilities in their training, severely limiting their usefulness. In particular, for larger gauge groups the instabilities make training impossible. We resolve this problem by introducing a novel gauge equivariant normalization layer which stabilizes the training. Furthermore, we prove a universal approximation theorem for our model. We train on samples with trivial Chern number only but show that our model generalizes to samples with non-trivial Chern number and provide various ablations of our setup.

artificial intelligence, chern number, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Sweden (0.28)
North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Information Technology (0.67)
Education (0.48)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Generalization Bounds for Rank-sparse Neural Networks

Neural Information Processing SystemsJun-22-2026, 22:28:37 GMT

It has been recently observed in much of the literature that neural networks exhibit a bottleneck rank property: for larger depths, the activation and weights of neural networks trained with gradient-based methods tend to be of approximately low rank. In fact, the rank of the activations of each layer converges to a fixed value referred to as the "bottleneck rank", which is the minimum rank required to represent the training data. This perspective is in line with the observation that regularizing linear networks (without activations) with weight decay is equivalent to minimizing the Schatten p quasi norm of the neural network. In this paper we investigate the implications of this phenomenon for generalization. More specifically, we prove generalization bounds for neural networks which exploit the approximate low rank structure of the weight matrices if present. The final results rely on the Schatten p quasi norms of the weight matrices: for small p, the bounds exhibit a sample complexity rOpWrL2q where W and L are the width and depth of the neural network respectively and where r is the rank of the weight matrices. As p increases, the bound behaves more like a norm-based bound instead.

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

AGradient Guided Diffusion Framework for Chance Constrained Programming

Neural Information Processing SystemsJun-22-2026, 22:27:13 GMT

Chance constrained programming (CCP) is a powerful framework for addressing optimization problems under uncertainty. In this paper, we introduce a novel Gradient-Guided Diffusion-based Optimization framework, termed GGDOpt, which tackles CCP through three key innovations.

artificial intelligence, diffusion model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Collapsing Taylor Mode Automatic Differentiation

Neural Information Processing SystemsJun-22-2026, 22:17:29 GMT

Computing partial differential equation (PDE) operators via nested backpropagation is expensive, yet popular, and severely restricts their utility for scientific machine learning. Recent advances, like the forward Laplacian and randomizing Taylor mode automatic differentiation (AD), propose forward schemes to address this. We introduce an optimization technique for Taylor mode that "collapses" derivatives by rewriting the computational graph, and demonstrate how to apply it to general linear PDE operators, and randomized Taylor mode. The modifications simply require propagating a sum up the computational graph, which could--or should-- be done by a machine learning compiler, without exposing complexity to users. We implement our collapsing procedure and evaluate it on popular PDE operators, confirming it accelerates Taylor mode and outperforms nested backpropagation.

artificial intelligence, deep learning, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe (0.67)
North America > Canada (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Creativity or Brute Force Using Brainteasers as a Window into the Problem Solving Abilities of Large Language Models

Neural Information Processing SystemsJun-22-2026, 22:16:53 GMT

Accuracy remains a standard metric for evaluating AI systems, but it offers limited insight into how models arrive at their solutions. In this work, we introduce a benchmark based on brainteasers written in long narrative form to probe more deeply into the types of reasoning strategies that models employ. Brainteasers are well-suited for this goal because they can be solved with multiple approaches, such as a few-step solution that uses a creative insight or a longer solution that uses more brute force. We investigate large language models (LLMs) across multiple layers of reasoning, focusing not only on correctness but also on the quality and creativity of their solutions. We investigate many aspects of the reasoning process: (1) semantic parsing of the brainteasers into precise mathematical competition-style formats; (2) self-correcting solutions based on ground-truth solutions; (3) producing step-bystep sketches of solutions; and (4) making use of hints. We find that LLMs are in many cases able to find creative, insightful solutions to brainteasers, suggesting that they capture some of the capacities needed to solve novel problems in creative ways. Nonetheless, there also remain situations where they rely on brute force, despite the availability of more efficient, creative solutions, highlighting a potential direction for improving LLM reasoning.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Asia (0.67)
Europe > Austria (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

AITesting Should Account for Sophisticated Strategic Behaviour

Neural Information Processing SystemsJun-22-2026, 22:13:48 GMT

This position paper argues for two claims regarding AI testing and evaluation. First, to remain informative about deployment behaviour, evaluations need account for the possibility that AI systems understand their circumstances and reason strategically. Second, game-theoretic analysis can inform evaluation design by formalising and scrutinising the reasoning in evaluation-based safety cases. Drawing on examples from existing AI systems, a review of relevant research, and formal strategic analysis of a stylised evaluation scenario, we present evidence for these claims and motivate several research directions.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.69)

Industry:

Leisure & Entertainment > Games (0.94)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching

Neural Information Processing SystemsJun-22-2026, 22:12:34 GMT

Instruction fine-tuning is crucial in NLP tasks, enhancing pretrained models' instruction-following capabilities and task-specific performance. However, obtaining high-quality fine-tuning data for large models is challenging due to data collection difficulties and high production costs. To address this, we propose MASTER, a novel data augmentation method that enriches original data through interactions among multiple agents with varying cognitive levels. We simulate three pedagogically grounded teaching scenarios, leveraging multi-agent conversations to generate high-quality teacher-student interaction data. Utilizing MASTER, we construct BOOST-QA, a fine-tuning dataset augmented from existing datasets like Orca-Math-200k, ProcQA, and OpenHermes2.5. Experiments show that models fine-tuned with BOOST-QA perform excellently across multiple benchmarks, demonstrating strong multitask generalization. Notably, MASTER significantly improves models' reasoning abilities in complex tasks, providing valuable insights for future research. Our code is publicly available at https://github.com/Toyhom/MASTER.

artificial intelligence, large language model, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia (1.00)
Europe (0.93)
North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.93)

Industry:

Information Technology (0.92)
Education > Educational Setting (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Incentivizing Desirable Effort Profiles in Strategic Classification: The Role of Causality & Uncertainty

Neural Information Processing SystemsJun-22-2026, 22:12:17 GMT

We study strategic classification in binary decision-making settings where agents can modify their features in order to improve their classification outcomes. Importantly, our work considers the causal structure across different features, acknowledging that effort in one feature may affect other features. The main goal of our work is to understand when and how much agent effort is invested towards desirable features, and how this is influenced by the deployed classifier, the causal structure of the agent's features, their ability to modify them, and the information available to the agent about the classifier and the feature causal graph. We characterize conditions under which agents with full information about the causal structure and the principal's classifier align with the principal's goals of incentivizing effort mostly in "desirable" features, and identify cases where designing such classifiers (from the principal's side) is still tractable despite general non-convexity. Under incomplete information, we show that uncertainty leads agents to prioritize features with high expected impact and low variance, which may often be misaligned with the principal's goals. Finally, using numerical experiments based on a cardiovascular disease risk study, we illustrate how to incentivize desirable modifications even under uncertainty.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Banking & Finance (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

d7a2222b8d41014e060cfeb0995501d0-Paper-Conference.pdf

Neural Information Processing SystemsJun-22-2026, 22:12:10 GMT

How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured on average over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoreticallyfounded solution to this problem: to train Self-Proving models that prove the correctness of their output to a verification algorithm V via an Interactive Proof. SelfProving models satisfy that, with high probability over an input sampled from a given distribution, the model generates a correct output and successfully proves its correctness to V. The soundness property of V guarantees that, for every input, no model can convince V of the correctness of an incorrect output. Thus, a Self-Proving model proves correctness of most of its outputs, while all incorrect outputs (of any model) are detected by V. We devise and analyze two generic methods for learning Self-Proving models: Transcript Learning (TL) which relies on access to transcripts of accepting interactions, and Reinforcement Learning from Verifier Feedback (RLVF) which trains a model by emulating interactions with the verifier.

artificial intelligence, machine learning, urlhttp, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.70)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback