extrapolation
Scalable Levy Process Priors for Spectral Kernel Learning
Gaussian processes are rich distributions over functions, with generalization properties determined by a kernel function. When used for long-range extrapolation, predictions are particularly sensitive to the choice of kernel parameters. It is therefore critical to account for kernel uncertainty in our predictive distributions. We propose a distribution over kernels formed by modelling a spectral mixture density with a Levy process. The resulting distribution has support for all stationary covariances---including the popular RBF, periodic, and Matern kernels---combined with inductive biases which enable automatic and data efficient learning, long-range extrapolation, and state of the art predictive performance. The proposed model also presents an approach to spectral regularization, as the Levy process introduces a sparsity-inducing prior over mixture components, allowing automatic selection over model order and pruning of extraneous components. We exploit the algebraic structure of the proposed process for O(n) training and O(1) predictions. We perform extrapolations having reasonable uncertainty estimates on several benchmarks, show that the proposed model can recover flexible ground truth covariances and that it is robust to errors in initialization.
- North America > Canada > Quebec > Montreal (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- (5 more...)
- North America > Canada > Quebec (0.04)
- North America > Canada > British Columbia (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
Empirical Gaussian Processes
Lin, Jihao Andreas, Ament, Sebastian, Tiao, Louis C., Eriksson, David, Balandat, Maximilian, Bakshy, Eytan
Gaussian processes (GPs) are powerful and widely used probabilistic regression models, but their effectiveness in practice is often limited by the choice of kernel function. This kernel function is typically handcrafted from a small set of standard functions, a process that requires expert knowledge, results in limited adaptivity to data, and imposes strong assumptions on the hypothesis space. We study Empirical GPs, a principled framework for constructing flexible, data-driven GP priors that overcome these limitations. Rather than relying on standard parametric kernels, we estimate the mean and covariance functions empirically from a corpus of historical observations, enabling the prior to reflect rich, non-trivial covariance structures present in the data. Theoretically, we show that the resulting model converges to the GP that is closest (in KL-divergence sense) to the real data generating process. Practically, we formulate the problem of learning the GP prior from independent datasets as likelihood estimation and derive an Expectation-Maximization algorithm with closed-form updates, allowing the model handle heterogeneous observation locations across datasets. We demonstrate that Empirical GPs achieve competitive performance on learning curve extrapolation and time series forecasting benchmarks.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Oceania > Samoa (0.04)
- Oceania > American Samoa (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Neural Arithmetic Logic Units
Andrew Trask, Felix Hill, Scott E. Reed, Jack Rae, Chris Dyer, Phil Blunsom
Specifically,one frequently observes failures when quantities that lie outside the numerical range used during training are encountered at test time, even when the target functionissimple (e.g., itdepends only onaggregating counts orlinear extrapolation). This failure patternindicates that the learned behavior is better characterized by memorization than by systematic abstraction.
- North America > United States > Tennessee (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)