probability divergence
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- (2 more...)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- (2 more...)
Sample Complexity of Probability Divergences under Group Symmetry
Chen, Ziyu, Katsoulakis, Markos A., Rey-Bellet, Luc, Zhu, Wei
We rigorously quantify the improvement in the sample complexity of variational divergence estimations for group-invariant distributions. In the cases of the Wasserstein-1 metric and the Lipschitz-regularized $\alpha$-divergences, the reduction of sample complexity is proportional to an ambient-dimension-dependent power of the group size. For the maximum mean discrepancy (MMD), the improvement of sample complexity is more nuanced, as it depends on not only the group size but also the choice of kernel. Numerical simulations verify our theories.
- North America > United States > New York > New York County > New York City (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (3 more...)
Unified Perspective on Probability Divergence via Maximum Likelihood Density Ratio Estimation: Bridging KL-Divergence and Integral Probability Metrics
Kato, Masahiro, Imaizumi, Masaaki, Minami, Kentaro
This paper provides a unified perspective for the Kullback-Leibler (KL)-divergence and the integral probability metrics (IPMs) from the perspective of maximum likelihood density-ratio estimation (DRE). Both the KL-divergence and the IPMs are widely used in various fields in applications such as generative modeling. However, a unified understanding of these concepts has still been unexplored. In this paper, we show that the KL-divergence and the IPMs can be represented as maximal likelihoods differing only by sampling schemes, and use this result to derive a unified form of the IPMs and a relaxed estimation method. To develop the estimation problem, we construct an unconstrained maximum likelihood estimator to perform DRE with a stratified sampling scheme. We further propose a novel class of probability divergences, called the Density Ratio Metrics (DRMs), that interpolates the KL-divergence and the IPMs. In addition to these findings, we also introduce some applications of the DRMs, such as DRE and generative adversarial networks. In experiments, we validate the effectiveness of our proposed methods.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)
Subadditivity of Probability Divergences on Bayes-Nets with Applications to Time Series GANs
Ding, Mucong, Daskalakis, Constantinos, Feizi, Soheil
GANs for time series data often use sliding windows or self-attention to capture underlying time dependencies. While these techniques have no clear theoretical justification, they are successful in significantly reducing the discriminator size, speeding up the training process, and improving the generation quality. In this paper, we provide both theoretical foundations and a practical framework of GANs for high-dimensional distributions with conditional independence structure captured by a Bayesian network, such as time series data. We prove that several probability divergences satisfy subadditivity properties with respect to the neighborhoods of the Bayes-net graph, providing an upper bound on the distance between two Bayes-nets by the sum of (local) distances between their marginals on every neighborhood of the graph. This leads to our proposed Subadditive GAN framework that uses a set of simple discriminators on the neighborhoods of the Bayes-net, rather than a giant discriminator on the entire network, providing significant statistical and computational benefits. We show that several probability distances including Jensen-Shannon, Total Variation, and Wasserstein, have subadditivity or generalized subadditivity. Moreover, we prove that Integral Probability Metrics (IPMs), which encompass commonly-used loss functions in GANs, also enjoy a notion of subadditivity under some mild conditions. Furthermore, we prove that nearly all f-divergences satisfy local subadditivity in which subadditivity holds when the distributions are relatively close. Our experiments on synthetic as well as real-world datasets verify the proposed theory and the benefits of subadditive GANs.
- South America > Argentina > Patagonia > Tierra del Fuego Province > Estado Nacional (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Africa > Cameroon > Gulf of Guinea (0.04)
GAN-QP: A Novel GAN Framework without Gradient Vanishing and Lipschitz Constraint
We know SGAN may have a risk of gradient vanishing. A significant improvement is WGAN, with the help of 1-Lipschitz constraint on discriminator to prevent from gradient vanishing. Is there any GAN having no gradient vanishing and no 1-Lipschitz constraint on discriminator? We do find one, called GAN-QP. To construct a new framework of Generative Adversarial Network (GAN) usually includes three steps: 1. choose a probability divergence; 2. convert it into a dual form; 3. play a min-max game. In this articles, we demonstrate that the first step is not necessary. We can analyse the property of divergence and even construct new divergence in dual space directly. As a reward, we obtain a simpler alternative of WGAN: GAN-QP. We demonstrate that GAN-QP have a better performance than WGAN in theory and practice.
- Workflow (0.54)
- Research Report (0.50)