AITopics | dim

The minima of modern neural network loss functions are typically not isolated, rather they form connected components of reparameterization invariant solutions on the training data. Analytically characterizing these solutions is a hard problem, but sampling approaches are feasible. By construction, existing methods either spread over low-loss regions, and thus do not sample reparameterization invariant solutions exactly, or are inherently local, which limits exploration of other minima valleys. We propose sampling such reparameterization invariant models using a dynamical system based on kinetic energy, subject to a gravitational pull and a friction term that dissipates energy from the system. Our proposed sampler, DIMS, is guaranteed to sample exactly from the minimum level sets and depends on physically motivated hyperparameters which allows control over the exploration capabilities of the sampler. We consider uncertainty quantification in Bayesian inference as the motivating problem and observe improved performance compared to previously proposed approaches.

approximation, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

2605.15459

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)

Add feedback

dbc8ce0fdfcd55172d73fb05dbae07fc-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 23:57:35 GMT

distillation, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

d75c474bc01735929a1fab5d0de3b189-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 22:44:11 GMT

artificial intelligence, machine learning, persistence module, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Reliable Estimation of KLDivergence using a Discriminator in Reproducing Kernel Hilbert Space Supplementary Material

Neural Information Processing SystemsApr-25-2026, 23:05:59 GMT

Organization: This supplementary material is presented in a format parallel to the main paper. The section numbers and titles are consistent with the main paper. But, here we also add one new section: Section 10 where we describe the societal impacts and possible negative impacts of the paper. Similarly, the Theorem numbers are consistent with the main paper, but we also have several additional theorems and lemmas which were not included in the main paper. GAN-type Objective for KLEstimation Let f be a discriminator, f: X IR. Let p(x) and q(x) be two probability density functions defined over the space X.

artificial intelligence, dim, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Supplementary of " Twins: Revisiting the Design of Spatial Attention in Vision Transformer "

Neural Information Processing SystemsApr-25-2026, 19:59:49 GMT

Swin transformer: Hierarchical vision transformer using shifted windows.

artificial intelligence, lsa gsa 1, machine learning, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

48cb136b65a69e8c2aa22913a0d91b2f-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 17:45:41 GMT

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

1e118ba9ee76c20df728b42a35fb4704-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 16:03:34 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

White-Box Transformers via Sparse Rate Reduction

Neural Information Processing SystemsApr-25-2026, 16:03:30 GMT

In this paper, we contend that the objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a mixture of low-dimensional Gaussian distributions supported on incoherent subspaces. The quality of the final representation can be measured by a unified objective function called sparse rate reduction. From this perspective, popular deep networks such as transformers can be naturally viewed as realizing iterative schemes to optimize this objective incrementally. Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens. This leads to a family of white-box transformer-like deep network architectures which are mathematically fully interpretable. Despite their simplicity, experiments show that these networks indeed learn to optimize the designed objective: they compress and sparsify representations of large-scale real-world vision datasets such as ImageNet, and achieve performance very close to thoroughly engineered transformers such as ViT.

artificial intelligence, machine learning, representation, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Faster Query Times for Fully Dynamic k-Center Clustering with Outliers

Neural Information Processing SystemsApr-25-2026, 15:32:12 GMT

Given a point set P M from a metric space (M,d)and numbers k,z N, the metric k-center problem with z outliers is to find a set C P of k points such that the maximum distance of all but at most z outlier points of P to their nearest center in C is minimized. We consider this problem in the fully dynamic model, i.e., under insertions and deletions of points, for the case that the metric space has a bounded doubling dimension dim. We utilize a hierarchical data structure to maintain the points and their neighborhoods, which enables us to efficiently find the clusters. In particular, our data structure can be queried at any time to generate a (3 + ε)-approximate solution for input values of k and z in worst-case query time ε O(dim)klognloglog, where is the ratio between the maximum and minimum distance between two points in P. Moreover, it allows insertion/deletion of a point in worst-case update time ε O(dim) lognlog . Our result achieves a significantly faster query time with respect to k and z than the current state-of-theart by Pellizzoni, Pietracaprina, and Pucci [18], which uses ε O(dim)(k+z)2 log query time to obtain a (3+ε)-approximate solution.

artificial intelligence, dim, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
Asia (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Forster Decomposition and Learning Halfspaces with Noise

Neural Information Processing SystemsApr-25-2026, 14:24:29 GMT

AForster transform is an operation that turns a distribution into one with good anticoncentration properties. While a Forster transform does not always exist, we show that any distribution can be efficiently decomposed as a disjoint mixture of few distributions for which a Forster transform exists and can be computed efficiently. As the main application of this result, we obtain the first polynomial-time algorithm for distribution-independent PAC learning of halfspaces in the Massart noise model with strongly polynomial sample complexity, i.e., independent of the bit complexity of the examples. Previous algorithms for this learning problem incurred sample complexity scaling polynomially with the bit complexity, even though such a dependence is not information-theoretically necessary.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Filters

Collaborating Authors

dim

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Don't Stop Me Yet: Sampling Loss Minima via Dissipative Riemannian Mechanics

dbc8ce0fdfcd55172d73fb05dbae07fc-Supplemental-Conference.pdf

d75c474bc01735929a1fab5d0de3b189-Paper-Conference.pdf

Reliable Estimation of KLDivergence using a Discriminator in Reproducing Kernel Hilbert Space Supplementary Material

Supplementary of " Twins: Revisiting the Design of Spatial Attention in Vision Transformer "

48cb136b65a69e8c2aa22913a0d91b2f-Supplemental.pdf

1e118ba9ee76c20df728b42a35fb4704-Supplemental-Conference.pdf

White-Box Transformers via Sparse Rate Reduction

Faster Query Times for Fully Dynamic k-Center Clustering with Outliers

Forster Decomposition and Learning Halfspaces with Noise