AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning

Neural Information Processing SystemsJun-17-2026, 18:17:42 GMT

Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods adopt Slot Attention or its variants to iteratively aggregate objects' super-pixels into a fixed set of query feature vectors, termed slots. However, their reliance on a static slot count leads to an object being represented as multiple parts when the number of objects varies. We introduce MetaSlot, a plug-and-play Slot Attention variant that adapts to variable object counts. MetaSlot (i) maintains a codebook that holds prototypes of objects in a dataset by vector-quantizing the resulting slot representations; (ii) removes duplicate slots from the traditionally aggregated slots by quantizing them with the codebook; and (iii) injects progressively weaker noise into the Slot Attention iterations to accelerate and stabilize the aggregation. MetaSlot is a general Slot Attention variant that can be seamlessly integrated into existing OCL architectures. Across multiple public datasets and tasks-including object discovery and recognition-models equipped with MetaSlot achieve significant performance gains and markedly interpretable slot representations, compared with existing Slot Attention variants.

machine learning, metaslot, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

Add feedback

MAPEstimation with Denoisers: Convergence Rates and Guarantees

Neural Information Processing SystemsJun-17-2026, 17:24:01 GMT

Denoiser models have become powerful tools for inverse problems, enabling the use of pretrained networks to approximate the score of a smoothed prior distribution. These models are often used in heuristic iterative schemes aimed at solving Maximum a Posteriori (MAP) optimisation problems, where the proximal operator of the negative log-prior plays a central role. In practice, this operator is intractable, and practitioners plug in a pretrained denoiser as a surrogate--despite the lack of general theoretical justification for this substitution. In this work, we show that a simple algorithm, closely related to several used in practice, provably converges to the proximal operator under a log-concavity assumption on the prior p. We show that this algorithm can be interpreted as a gradient descent on smoothed proximal objectives. Our analysis thus provides a theoretical foundation for a class of empirically successful but previously heuristic methods.

artificial intelligence, inequality, machine learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Attention-based clustering

Neural Information Processing SystemsJun-17-2026, 17:05:10 GMT

Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an unsupervised setting. In particular, we demonstrate their suitability for clustering when the input data is generated from a Gaussian mixture model. To this end, we study a simplified two-head attention layer and define a population risk whose minimization with unlabeled data drives the head parameters to align with the true mixture centroids. This phenomenon highlights the ability of attention-based layers to capture underlying distributional structure. We further examine an attention layer with key, query, and value matrices fixed to the identity, and show that, even without any trainable parameters, it can perform in-context quantization, revealing the surprising capacity of transformer-based methods to adapt dynamically to input-specific distributions.

artificial intelligence, centroid, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe (0.67)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Towards Understanding Transformers in Learning Random Walks

Neural Information Processing SystemsJun-17-2026, 17:05:01 GMT

Transformers have proven highly effective across various applications, especially in handling sequential data such as natural languages and time series. However, transformer models often lack clear interpretability, and the success of transformers has not been well understood in theory. In this paper, we study the capability and interpretability of transformers in learning a family of classic statistical models, namely random walks on circles. We theoretically demonstrate that, after training with gradient descent, a one-layer transformer model can achieve optimal accuracy in predicting random walks. Importantly, our analysis reveals that the trained model is interpretable: the trained softmax attention serves as a token selector, focusing on the direct parent state; subsequently, the value matrix executes a onestep probability transition to predict the location of the next state based on this parent state. We also show that certain edge cases not covered by our theory are indeed failure cases, demonstrating that our theoretical conditions are tight. By investigating these success and failure cases, it is revealed that gradient descent with small initialization may fail or struggle to converge to a good solution in certain simple tasks even beyond random walks. Experiments are conducted to support our theoretical findings.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamics

Neural Information Processing SystemsJun-17-2026, 16:54:45 GMT

Many emerging applications--such as adversarial training, AI alignment, and robust optimization--can be framed as zero-sum games between neural nets, with von Neumann-Nash equilibria (NE) capturing the desirable system behavior. While such games often involve non-convex non-concave objectives, empirical evidence shows that simple gradient methods frequently converge, suggesting a hidden geometric structure. In this paper, we provide a theoretical framework that explains this phenomenon through the lens of hidden convexity and overparameterization. We identify sufficient conditions--spanning initialization, training dynamics, and network width--that guarantee global convergence to a NE in a broad class of non-convex min-max games. To our knowledge, this is the first such result for games that involve two-layer neural networks. Technically, our approach is twofold: (a) we derive a novel path-length bound for the alternating gradient descent-ascent scheme in min-max games; and (b) we show that the reduction from a hidden convex-concave geometry to two-sided Polyak-Łojasiewicz (PL) min-max condition hold with high probability under overparameterization, using tools from random matrix theory.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Europe (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology (0.92)
Government (0.92)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

On the rankability of visual embeddings

Neural Information Processing SystemsJun-17-2026, 16:42:08 GMT

We study whether visual embedding models capture continuous, ordinal attributes along linear directions, which we term rank axes. We define a model as rankable for an attribute if projecting embeddings onto such an axis preserves the attribute's order. Across 7 popular encoders and 9 datasets with attributes like age, crowd count, head pose, aesthetics, and recency, we find that many embeddings are inherently rankable. Surprisingly, a small number of samples, or even just two extreme examples, often suffice to recover meaningful rank axes, without full-scale supervision. These findings open up new use cases for image ranking in vector databases and motivate further study into the structure and learning of rankable embeddings.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe > Germany (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing

Neural Information Processing SystemsJun-17-2026, 16:32:34 GMT

Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model's performance. While prior works have demonstrated the benefits of specific heuristic retraining schemes, the question of how to optimally combine the model's predictions and the provided labels remains largely open.

machine learning, natural language, noise model, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

Out-of-Distribution Generalized Graph Anomaly Detection with Homophily-aware Environment Mixup

Neural Information Processing SystemsJun-17-2026, 15:59:45 GMT

Graph anomaly detection (GAD) is widely prevalent in scenarios such as financial fraud detection, anti-money laundering and social bot detecion. However, structural distribution shifts are commonly observed in real-world GAD data due to selection bias, resulting in reduced homophily. Existing GAD methods tend to rely on homophilic shortcuts when trained on high-homophily structures, limiting their ability to generalize well to data with low homophily under structural distribution shifts. In this study, we propose to handle structural distribution shifts by generating novel environments characterized by diverse homophilic structures and utilizing invariant patterns, i.e., features and structures with the capability of stable prediction across structural distribution shifts, which face two challenges: (1) How to discover invariant patterns from entangled features and structures, as structures are sensitive to varying homophilic distributions.

data mining, distribution shift, machine learning, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Industry:

Information Technology (0.93)
Law Enforcement & Public Safety > Fraud (0.54)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Multi-View Oriented GPLVM: Expressiveness and Efficiency

Neural Information Processing SystemsJun-17-2026, 15:51:56 GMT

The multi-view Gaussian process latent variable model (MV-GPLVM) aims to learn a unified representation from multi-view data but is hindered by challenges such as limited kernel expressiveness and low computational efficiency. To overcome these issues, we first introduce a new duality between the spectral density and the kernel function. By modeling the spectral density with a bivariate Gaussian mixture, we then derive a generic and expressive kernel termed Next-Gen Spectral Mixture (NG-SM) for MV-GPLVMs. To address the inherent computational inefficiency of the NG-SM kernel, we design a new form of random Fourier feature approximation. Combined with a tailored reparameterization trick, this approximation enables scalable variational inference for both the model and the unified latent representations. Numerical evaluations across a diverse range of multi-view datasets demonstrate that our proposed method consistently outperforms state-of-the-art models in learning meaningful latent representations.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)
Research Report > Promising Solution (0.65)

Industry:

Information Technology (0.67)
Education (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

ASingle-Swap Local Search Algorithm for k-means of Lines

Neural Information Processing SystemsJun-17-2026, 15:42:05 GMT

Clustering is a fundamental problem that has been extensively studied over past few decades, with most research focusing on point-based clustering such as kmeans, k-median, and k-center. However, numerous real-world applications, such as motion analysis, computer vision, and missing data analysis, require clustering over structured data, including lines, time series and affine subspaces (flats), where traditional point-based clustering algorithms often fall short. In this paper, we study the k-means of lines problem, where the input is a set L of lines in Rd, and the goal is to find k centers C in Rd such that the sum of squared distances from each line in L to its nearest center in C is minimized. The local search algorithm is a well-established strategy for point-based k-means clustering, known for its efficiency and provable approximation guarantees. However, extending local search algorithm to the k-means of lines problem is nontrivial, as the capture relation used in point-based clustering does not generalize to the line setting.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback