exponential family model
- North America > United States > California > Yolo County > Davis (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Denmark (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Denmark (0.04)
Reviews: Sampled Softmax with Random Fourier Features
As a result, I will retain my scores and recommend this paper for acceptance. I kindly ask the authors to incorporate all the promised changes to the camera ready version. In such problems, it becomes expensive to evaluate the log-partition function for each instance from training sample. The main idea is to approximate the log-partition function by sampling a small number of scores corresponding to negative labels (different from the label assigned to a training sample). The model is given in Eq. (1), where the score for the i-th class is given by the inner product between a representation of an instance h and a parameter vector c_i representing the class.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > California > Yolo County > Davis (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Distributionally Robust Optimisation with Bayesian Ambiguity Sets
Dellaporta, Charita, O'Hara, Patrick, Damoulas, Theodoros
Decision making under uncertainty is challenging since the data-generating process (DGP) is often unknown. Bayesian inference proceeds by estimating the DGP through posterior beliefs about the model's parameters. However, minimising the expected risk under these posterior beliefs can lead to sub-optimal decisions due to model uncertainty or limited, noisy observations. To address this, we introduce Distributionally Robust Optimisation with Bayesian Ambiguity Sets (DRO-BAS) which hedges against uncertainty in the model by optimising the worst-case risk over a posterior-informed ambiguity set. We show that our method admits a closed-form dual representation for many exponential family members and showcase its improved out-of-sample robustness against existing Bayesian DRO methodology in the Newsvendor problem.
Annealing Between Distributions by Averaging Moments
Many powerful Monte Carlo techniques for estimating partition functions, such as annealed importance sampling (AIS), are based on sampling from a sequence of intermediate distributions which interpolate between a tractable initial distribution and the intractable target distribution. The near-universal practice is to use geometric averages of the initial and target distributions, but alternative paths can perform substantially better. We present a novel sequence of intermediate distributions for exponential families defined by averaging the moments of the initial and target distributions. We analyze the asymptotic performance of both the geometric and moment averages paths and derive an asymptotically optimal piecewise linear schedule. AIS with moment averaging performs well empirically at estimating partition functions of restricted Boltzmann machines (RBMs), which form the building blocks of many deep learning models.
- North America > Canada > Ontario > Toronto (0.29)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Middle East > Jordan (0.04)
Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking
The Bayesian online change point detection (BOCPD) algorithm provides an efficient way to do exact inference when the parameters of an underlying model may suddenly change over time. BOCPD requires computation of the underlying model's posterior predictives, which can only be computed online in O(1) time and memory for exponential family models. We develop variational approximations to the posterior on change point times (formulated as run lengths) for efficient inference when the underlying model is not in the exponential family, and does not have tractable posterior predictive distributions. In doing so, we develop improvements to online variational inference. We apply our methodology to a tracking problem using radar data with a signal-to-noise feature that is Rice distributed. We also develop a variational method for inferring the parameters of the (non-exponential family) Rice distribution.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > United Kingdom > England > Greater London > London (0.14)
- North America > United States > New York (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Stabilizing the Maximal Entropy Moment Method for Rarefied Gas Dynamics at Single-Precision
Zheng, Candi, Yang, Wang, Chen, Shiyi
Developing extended hydrodynamics equations valid for both dense and rarefied gases remains a great challenge. A systematical solution for this challenge is the moment method describing both dense and rarefied gas behaviors with moments of gas molecule velocity distributions. Among moment methods, the maximal entropy moment method (MEM) stands out for its well-posedness and stability, which utilizes velocity distributions with maximized entropy. However, finding such distributions requires solving an ill-conditioned and computation-demanding optimization problem. This problem causes numerical overflow and breakdown when the numerical precision is insufficient, especially for flows like high-speed shock waves. It also prevents modern GPUs from accelerating optimization with their enormous single floating-point precision computation power. This paper aims to stabilize MEM, making it practical for simulating very strong normal shock waves on modern GPUs at single precision. We propose the gauge transformations for MEM, making the optimization less ill-conditioned. We also tackle numerical overflow and breakdown by adopting the canonical form of distribution and Newton's modified optimization method. With these techniques, we achieved a single-precision GPU simulation of a Mach 10 shock wave with 35 moments MEM, surpassing the previous double-precision results of Mach 4. Moreover, we argued that over-refined spatial mesh degrades both the accuracy and stability of MEM. Overall, this paper makes the maximal entropy moment method practical for simulating very strong normal shock waves on modern GPUs at single-precision, with significant stability improvement compared to previous methods.
- Asia > China (0.28)
- North America > United States > New York > New York County > New York City (0.14)
- Europe (0.14)
A Constant-per-Iteration Likelihood Ratio Test for Online Changepoint Detection for Exponential Family Models
Ward, Kes, Romano, Gaetano, Eckley, Idris, Fearnhead, Paul
Online changepoint detection algorithms that are based on likelihood-ratio tests have been shown to have excellent statistical properties. However, a simple online implementation is computationally infeasible as, at time $T$, it involves considering $O(T)$ possible locations for the change. Recently, the FOCuS algorithm has been introduced for detecting changes in mean in Gaussian data that decreases the per-iteration cost to $O(\log T)$. This is possible by using pruning ideas, which reduce the set of changepoint locations that need to be considered at time $T$ to approximately $\log T$. We show that if one wishes to perform the likelihood ratio test for a different one-parameter exponential family model, then exactly the same pruning rule can be used, and again one need only consider approximately $\log T$ locations at iteration $T$. Furthermore, we show how we can adaptively perform the maximisation step of the algorithm so that we need only maximise the test statistic over a small subset of these possible locations. Empirical results show that the resulting online algorithm, which can detect changes under a wide range of models, has a constant-per-iteration cost on average.