AITopics | variational method

Deep learning models often have more parameters than observations, and still perform well. This is sometimes described as a paradox. In this work, we show experimentally that despite their huge number of parameters, deep neural networks can compress the data losslessly even when taking the cost of encoding the parameters into account. Such a compression viewpoint originally motivated the use of variational methods in neural networks. However, we show that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. Better encoding methods, imported from the Minimum Description Length (MDL) toolbox, yield much better compression values on deep networks.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

c5dac56bdbbee9fb457946742d613d71-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 20:21:24 GMT

In particular, we will be interested in log-supermodular (LSM) potential functions as they encourage agreement between the variables upon which theyoperate.

artificial intelligence, inference, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

322f62469c5e3c7dc3e58f5a4d1ea399-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 00:34:08 GMT

double descent, marginalization, multisw ag, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.51)

Add feedback

Rényi Divergence Variational Inference

Yingzhen Li, Richard E. Turner

Neural Information Processing SystemsNov-21-2025, 07:11:27 GMT

Approximate inference, that is approximating posterior distributions and likelihood functions, is at the core of modern probabilistic machine learning.

approximation, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Transfer of Value Functions via Variational Methods

Neural Information Processing SystemsNov-20-2025, 22:37:49 GMT

We consider the problem of transferring value functions in reinforcement learning. We propose an approach that uses the given source tasks to learn a prior distribution over optimal value functions and provide an efficient variational approximation of the corresponding posterior in a new target task. We show our approach to be general, in the sense that it can be combined with complex parametric function approximators and distribution models, while providing two practical algorithms based on Gaussians and Gaussian mixtures. We theoretically analyze them by deriving a finite-sample analysis and provide a comprehensive empirical evaluation in four different domains.

name change, value function, variational method, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

The Description Length of Deep Learning models

Neural Information Processing SystemsNov-20-2025, 22:03:44 GMT

Deep learning models often have more parameters than observations, and still perform well. This is sometimes described as a paradox. In this work, we show experimentally that despite their huge number of parameters, deep neural networks can compress the data losslessly even when taking the cost of encoding the parameters into account. Such a compression viewpoint originally motivated the use of variational methods in neural networks. However, we show that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. Better encoding methods, imported from the Minimum Description Length (MDL) toolbox, yield much better compression values on deep networks.

deep learning model, description length, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

13f320e7b5ead1024ac95c3b208610db-Reviews.html

Neural Information Processing SystemsOct-3-2025, 06:56:43 GMT

The paper introduces a probabilistic model for networks which assigns each node in the network to multiple, overlapping latent communities. Inference is done using a stochastic variational method and the experimental evaluations are performed on very large networks. The first thing I note is that you do not cite Morup et al. (2010) "Infinite multiple membership relational modelling for complex networks", which in truth was the first work to perform inference for a latent feature relational model on large datasets -- in effect, rendering your statement on 067-068 "... these innovations allow the first..." incorrect. This is a rather serious oversight, because their paper not only performs large scale inference, but their method is also an MCMC method, which is well-known to usually produce more accurate results than variational methods. I believe the strongest contribution from this paper is the application of a stochastic variational inference method to a relational data model.

inference, relational data model, variational method, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.39)

Add feedback

Analysis of Variational Sparse Autoencoders

Baker, Zachary, Li, Yuxiao

arXiv.org Artificial IntelligenceOct-3-2025

Sparse Autoencoders (SAEs) have emerged as a promising approach for interpreting neural network representations by learning sparse, human-interpretable features from dense activations. We investigate whether incorporating variational methods into SAE architectures can improve feature organization and interpretability. We introduce the Variational Sparse Autoencoder (vSAE), which replaces deterministic ReLU gating with stochastic sampling from learned Gaussian posteriors and incorporates KL divergence regularization toward a standard normal prior. Our hypothesis is that this probabilistic sampling creates dispersive pressure, causing features to organize more coherently in the latent space while avoiding overlap. We evaluate a TopK vSAE against a standard TopK SAE on Pythia-70M transformer residual stream activations using comprehensive benchmarks including SAE Bench, individual feature interpretability analysis, and global latent space visualization through t-SNE. The vSAE underperforms standard SAE across core evaluation metrics, though excels at feature independence and ablation metrics. The KL divergence term creates excessive regularization pressure that substantially reduces the fraction of living features, leading to observed performance degradation. While vSAE features demonstrate improved robustness, they exhibit many more dead features than baseline. Our findings suggest that naive application of variational methods to SAEs does not improve feature organization or interpretability.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2509.22994

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback