proportionality
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- North America > United States > New York > Rensselaer County > Troy (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (4 more...)
Provable benefits of score matching
Score matching is an alternative to maximum likelihood (ML) for estimating a probability distribution parametrized up to a constant of proportionality. By fitting the ''score'' of the distribution, it sidesteps the need to compute this constant of proportionality (which is often intractable).While score matching and variants thereof are popular in practice, precise theoretical understanding of the benefits and tradeoffs with maximum likelihood---both computational and statistical---are not well understood. In this work, we give the first example of a natural exponential family of distributions such that the score matching loss is computationally efficient to optimize, and has a comparable statistical efficiency to ML, while the ML loss is intractable to optimize using a gradient-based method. The family consists of exponentials of polynomials of fixed degree, and our result can be viewed as a continuous analogue of recent developments in the discrete setting. Precisely, we show: (1) Designing a zeroth-order or first-order oracle for optimizing the maximum likelihood loss is NP-hard.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Oceania > Australia > New South Wales > Sydney (0.05)
- North America > United States (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
A Scaling Laws
While results presented in the main text of the paper show scaling by averaging across cortex, we can also examine scaling on a per-voxel basis. Model size increases in semantic models seem to be most beneficial for predicting amodal, post-auditory cognitive areas such as prefrontal cortex. Figure B.1: Performance of audio encoding models, averaged across all voxels in auditory cortex. Figure B.2: Performance of HuBERT models, averaged across voxels in cortex. Figure D.1: Long Context Artifact - An example of a long context artifact effect as measured on an Figure E.2: Histogram showing the slopes of voxelwise scaling laws for two OPT model sizes, shown Flatmaps presented in the main text only used one subject, S3 .
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- North America > United States > New York > Rensselaer County > Troy (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (4 more...)
- Oceania > Australia > New South Wales > Sydney (0.05)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)