AITopics | Europe

Collaborating Authors

Europe

Integral Imprecise Probability Metrics

Neural Information Processing SystemsJun-22-2026, 18:59:49 GMT

Quantifying differences between probability distributions is fundamental to statistics and machine learning, primarily for comparing statistical uncertainty. In contrast, epistemic uncertainty--due to incomplete knowledge--requires richer representations than those offered by classical probability. Imprecise probability (IP) theory offers such models, capturing ambiguity and partial belief. This has driven growing interest in imprecise probabilistic machine learning (IPML), where inference and decision-making rely on broader uncertainty models--highlighting the need for metrics beyond classical probability. This work introduces the Integral Imprecise Probability Metric (IIPM) framework, a Choquet integral-based generalisation of classical Integral Probability Metrics (IPMs) to the setting of capacities--a broad class of IP models encompassing many existing ones, including lower probabilities, probability intervals, belief functions, and more. Theoretically, we establish conditions under which IIPM serves as a valid metric and metrises a form of weak convergence of capacities. Practically, IIPM not only enables comparison across different IP models but also supports the quantification of epistemic uncertainty (EU) within a single IP model. In particular, by comparing an IP model with its conjugate, IIPM gives rise to a new class of EU measures--Maximum Mean Imprecisions (MMIs)--which satisfy key axiomatic properties proposed in the uncertainty quantification literature. We validate MMI through selective classification experiments, demonstrating strong empirical performance against established EU measures, and outperforming them when classical methods struggle to scale to a large number of classes.

artificial intelligence, information management, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Europe > United Kingdom (0.27)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Information Management (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Performative Validity of Recourse Explanations

Neural Information Processing SystemsJun-22-2026, 18:51:25 GMT

When applicants get rejected by a high-stakes algorithmic decision system, recourse explanations provide actionable suggestions for applicants on how to change their input features to get a positive evaluation. A crucial yet overlooked phenomenon is that recourse explanations are performative: When many applicants act according to their recommendations, their collective behavior may shift the data distribution and, once the model is refitted, also the decision boundary. Consequently, the recourse algorithm may render its own recommendations invalid, such that applicants who make the effort of implementing their recommendations may be rejected again when they reapply. In this work, we formally characterize the conditions under which recourse explanations remain valid under their own performative effects. In particular, we prove that recourse actions may become invalid if they are influenced by or if they intervene on non-causal variables. Based on this analysis, we caution against the use of standard counterfactual explanation and causal recourse methods, and instead advocate for recourse methods that recommend actions exclusively on causal variables.

intervention, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

State Size Independent Statistical Error Bound for Discrete Diffusion Models

Neural Information Processing SystemsJun-22-2026, 18:39:47 GMT

Diffusion models operating in discrete state spaces have emerged as powerful approaches, demonstrating remarkable efficacy across diverse domains, including reasoning tasks and molecular design. Despite their promising applications, the theoretical foundations of these models remain substantially underdeveloped, with the existing literature predominantly focusing on continuous-state diffusion models. A critical gap persists in the theoretical understanding of discrete diffusion modeling: the absence of a rigorous framework for quantifying estimation error with finite data. Consequently, the fundamental question of how precisely one can reconstruct the true underlying distribution from a limited training set remains unresolved. In this work, we analyze the estimation error induced by a score estimation of the discrete diffusion models. One of the main difficulties in the analysis stems from the fact that the cardinality of the state space can be exponentially large with respect to its dimension, which results in an intractable error bound by a naive approach. To overcome this difficulty, we make use of a property that the state space can be smoothly embedded in a continuous Euclidean space that enables us to derive a cardinality independent bound, which is more practical in real applications. In particular, we consider a setting where the state space is structured as a hypercube graph, and another where the induced graph Laplacian can be asymptotically well approximated by the ordinary Laplacian defined on the continuous space, and then derive state space size independent bounds.

artificial intelligence, diffusion model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MMaDA: Multimodal Large Diffusion Language Models

Neural Information Processing SystemsJun-22-2026, 18:39:29 GMT

We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. The approach is distinguished by three key innovations: (i) MMaDA adopts a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components. This architecture ensures seamless integration and processing across different data types.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: Europe (0.93)

Genre: Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

SIFusion: AUnified Fusion Framework for Multi-granularity Arctic Sea Ice Forecasting

Neural Information Processing SystemsJun-22-2026, 18:30:08 GMT

Arctic sea ice performs a vital role in global climate and has paramount impacts on both polar ecosystems and coastal communities. In the last few years, multiple deep learning based pan-Arctic sea ice concentration (SIC) forecasting methods have emerged and showcased superior performance over physics-based dynamical models. However, previous methods forecast SIC at a fixed temporal granularity, e.g.

data mining, machine learning, sifusion, (21 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

AttentionPredictor: Temporal Patterns Matter for KVCache Compression

Neural Information Processing SystemsJun-22-2026, 18:28:20 GMT

With the development of large language models (LLMs), efficient inference through Key-Value (KV) cache compression has attracted considerable attention, especially for long-context generation. To compress the KV cache, recent methods identify critical KV tokens through static modeling of attention scores. However, these methods often struggle to accurately determine critical tokens as they neglect the temporal patterns in attention scores, resulting in a noticeable degradation in LLM performance. To address this challenge, we propose AttentionPredictor, which is the first learning-based method to directly predict attention patterns for KV cache compression and critical token identification. Specifically, AttentionPredictor learns a lightweight, unified convolution model to dynamically capture spatiotemporal patterns and predict the next-token attention scores. An appealing feature of AttentionPredictor is that it accurately predicts the attention score and shares the unified prediction model, which consumes negligible memory, among all transformer layers. Moreover, we propose a cross-token critical cache prefetching framework that hides the token estimation time overhead to accelerate the decoding stage. By retaining most of the attention information, AttentionPredictor achieves 13 KV cache compression and 5.6 speedup in a cache offloading scenario with comparable LLM performance, significantly outperforming the stateof-the-arts.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia (0.46)
Europe > Austria > Vienna (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to cluster neuronal function

Neural Information Processing SystemsJun-22-2026, 18:11:39 GMT

Deep neural networks trained to predict neural activity from visual input and behaviour have shown great potential to serve as digital twins of the visual cortex. Per-neuron embeddings derived from these models could potentially be used to map the functional landscape or identify cell types. However, state-of-the-art predictive models of mouse V1 do not generate functional embeddings that exhibit clear clustering patterns which would correspond to cell types. This raises the question whether the lack of clustered structure is due to limitations of current models or a true feature of the functional organization of mouse V1. In this work, we introduce DECEMber - Deep Embedding Clustering via Expectation Maximization-based refinement - an explicit inductive bias into predictive models that enhances clustering by adding an auxiliary t-distribution-inspired loss function that enforces structured organization among per-neuron embeddings.

artificial intelligence, machine learning, neuron, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.93)
Europe > Germany > Lower Saxony > Gottingen (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training

Neural Information Processing SystemsJun-22-2026, 18:03:35 GMT

Large language models are trained with tokenizers, and the resulting token distribution is highly imbalanced: a few words dominate the stream while most occur rarely. Recent practice favors ever-larger vocabularies, but it is unclear where the benefit comes from. To this end, we perform a controlled study that scales the vocabulary of the language model from 24K to 196K while holding data, computation, and optimization unchanged. We begin by quantifying the complexity of tokenized text - formalized via Kolmogorov complexity - and show that larger vocabularies reduce this complexity. Above 24K, every common word is already tokenized as a single token, so enlarging vocabulary only deepens the relative token-frequency imbalance. Word-level loss decomposition shows that larger vocabularies reduce cross-entropy loss almost exclusively by lowering uncertainty on the 2,500 most frequent words, even though loss on the rare tail rises. Same frequent words cover roughly 75%of tokens in downstream benchmarks, this training advantage transfers intact. We further show that enlarging model parameters with a fixed vocabulary yields the same frequent-word benefit. Our results recast "bigger vocabularies help" as "lowering complexity of tokenized text helps," offering a simple, principled knob for tokenizer-model co-design and clarifying the loss dynamics that govern language model scaling in pre-training.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Vision Transformers with Self-Distilled Registers

Neural Information Processing SystemsJun-22-2026, 18:01:58 GMT

Vision Transformers (ViTs) have emerged as the dominant architecture for visual processing tasks, demonstrating excellent scalability with increased training data and model size. However, recent work has identified the emergence of artifact tokens in ViTs that are incongruous with local semantics. These anomalous tokens degrade ViT performance in tasks that require fine-grained localization or structural coherence. An effective mitigation of this issue is the addition of register tokens to ViTs, which implicitly "absorb" the artifact term during training. Given the availability of existing large-scale pre-trained ViTs, in this paper we seek to add register tokens to existing models without retraining the models from scratch, which is infeasible considering their size. Specifically, we propose Post Hoc Registers (PH-Reg), an efficient self-distillation method that integrates registers into an existing ViT without requiring additional labeled data and full retraining.

artificial intelligence, machine learning, transformer, (17 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.68)

Add feedback

How Will the New Prime Minister Be Chosen--and How Soon Could They Be in Downing Street?

TIME - TechJun-22-2026, 18:00:45 GMT

Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW?

artificial intelligence, instant alert, open follow modal personalized content, (13 more...)

TIME - Tech

Country: Europe > United Kingdom (1.00)

Industry: Government > Regional Government > Europe Government > United Kingdom Government (0.52)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.43)

Add feedback