pervasive error
The continuous Bernoulli: fixing a pervasive error in variational autoencoders
Variational autoencoders (VAE) have quickly become a central tool in machine learning, applicable to a broad range of data types and latent variable models. By far the most common first step, taken by seminal papers and by core software libraries alike, is to model MNIST data using a deep network parameterizing a Bernoulli likelihood. This practice contains what appears to be and what is often set aside as a minor inconvenience: the pixel data is [0,1] valued, not {0,1} as supported by the Bernoulli likelihood. Here we show that, far from being a triviality or nuisance that is convenient to ignore, this error has profound importance to VAE, both qualitative and quantitative. We introduce and fully characterize a new [0,1]-supported, single parameter distribution: the continuous Bernoulli, which patches this pervasive bug in VAE. This distribution is not nitpicking; it produces meaningful performance improvements across a range of metrics and datasets, including sharper image samples, and suggests a broader class of performant VAE.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Colorado (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Reviews: The continuous Bernoulli: fixing a pervasive error in variational autoencoders
I have read the author response and am strengthening my confidence (since after the response and seeing other reviews, I believe I have understood everything correctly). I loved the analysis of the "concentration of mass at the extrema" between the CB and beta that the authors provided in their response. It is exactly this kind of careful study and how it relates to what you saw in your experiments with MNIST (and why it matters specifically for the particular characteristics of the dataset) that make me love a paper like this. I hope that the authors add that analysis to the supplementary material at least. It almost sounds like your supplement could even be a mini paper on such a study that's interesting in its own right (though please don't write another paper on it). I find the paper very inspiring.
Reviews: The continuous Bernoulli: fixing a pervasive error in variational autoencoders
This paper generated an incredible amount of discussion among the reviewers, with many "pros": -- The paper identifies a bad practice that so many others have not so carefully dealt with in the past. The paper asks the question: "if we assume as others before that we may treat as binary, are the bad implications negligible?" The paper shows that the answer is very much no by exploring the shape of the normalising constants and displaying a logical, scientifically exposited train of thought to precisely characterise the source of the resulting error. Adding experiments with new architectures would not give meaningful insights since it is a kind of independent choice. The reviewers would ask the authors to carefully incorporate this question and variants of this question in their final version: "If a Gaussian likelihood has a support mismatch, then just truncate the Gaussian on (0, 1), why not this choice?"
The continuous Bernoulli: fixing a pervasive error in variational autoencoders
Variational autoencoders (VAE) have quickly become a central tool in machine learning, applicable to a broad range of data types and latent variable models. By far the most common first step, taken by seminal papers and by core software libraries alike, is to model MNIST data using a deep network parameterizing a Bernoulli likelihood. This practice contains what appears to be and what is often set aside as a minor inconvenience: the pixel data is [0,1] valued, not {0,1} as supported by the Bernoulli likelihood. Here we show that, far from being a triviality or nuisance that is convenient to ignore, this error has profound importance to VAE, both qualitative and quantitative. We introduce and fully characterize a new [0,1]-supported, single parameter distribution: the continuous Bernoulli, which patches this pervasive bug in VAE.
The continuous Bernoulli: fixing a pervasive error in variational autoencoders
Loaiza-Ganem, Gabriel, Cunningham, John P.
Variational autoencoders (VAE) have quickly become a central tool in machine learning, applicable to a broad range of data types and latent variable models. By far the most common first step, taken by seminal papers and by core software libraries alike, is to model MNIST data using a deep network parameterizing a Bernoulli likelihood. This practice contains what appears to be and what is often set aside as a minor inconvenience: the pixel data is [0,1] valued, not {0,1} as supported by the Bernoulli likelihood. Here we show that, far from being a triviality or nuisance that is convenient to ignore, this error has profound importance to VAE, both qualitative and quantitative. We introduce and fully characterize a new [0,1]-supported, single parameter distribution: the continuous Bernoulli, which patches this pervasive bug in VAE.