Estimating the Probability of Sampling a Trained Neural Network at Random
–arXiv.org Artificial Intelligence
They evaluate simple mass, under a Gaussian or uniform prior, gradient-free learning algorithms, such as the "Guess & of a region in neural network parameter space Check" optimizer which randomly samples parameters until corresponding to a particular behavior, such as it stumbles upon a network that achieves training loss achieving test loss below some threshold. When under some threshold, and find that these methods have the prior is uniform, this problem is equivalent similar generalization behavior to gradient descent, at least to measuring the volume of a region. We show on the very simple tasks they tested. Teney et al. (2024) empirically and theoretically that existing algorithms find that randomly initialized networks represent very simple for estimating volumes in parameter space functions, which would explain the simplicity bias of underestimate the true volume by millions of orders deep learning if SGD behaves similarly to Guess & Check. of magnitude. We find that this error can be dramatically reduced, but not entirely eliminated, Additionally, Mingard et al. (2021) provide evidence that with an importance sampling method using SGD may be an approximate Bayesian sampler, where the gradient information that is already provided prior distribution over functions is equal to the distribution by popular optimizers. The negative logarithm of over functions represented by randomly initialized networks.
arXiv.org Artificial Intelligence
Jan-30-2025
- Country:
- North America > Canada
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Genre:
- Research Report (0.50)