Estimating the Probability of Sampling a Trained Neural Network at Random

Jan-30-2025–arXiv.org Artificial Intelligence

They evaluate simple mass, under a Gaussian or uniform prior, gradient-free learning algorithms, such as the "Guess & of a region in neural network parameter space Check" optimizer which randomly samples parameters until corresponding to a particular behavior, such as it stumbles upon a network that achieves training loss achieving test loss below some threshold. When under some threshold, and find that these methods have the prior is uniform, this problem is equivalent similar generalization behavior to gradient descent, at least to measuring the volume of a region. We show on the very simple tasks they tested. Teney et al. (2024) empirically and theoretically that existing algorithms find that randomly initialized networks represent very simple for estimating volumes in parameter space functions, which would explain the simplicity bias of underestimate the true volume by millions of orders deep learning if SGD behaves similarly to Guess & Check. of magnitude. We find that this error can be dramatically reduced, but not entirely eliminated, Additionally, Mingard et al. (2021) provide evidence that with an importance sampling method using SGD may be an approximate Bayesian sampler, where the gradient information that is already provided prior distribution over functions is equal to the distribution by popular optimizers. The negative logarithm of over functions represented by randomly initialized networks.

artificial intelligence, machine learning, neighborhood, (19 more...)

arXiv.org Artificial Intelligence

Jan-30-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Ontario > Toronto (0.14)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.46)
  - Machine Learning
    - Statistical Learning > Gradient Descent (0.35)
    - Neural Networks > Deep Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found