AITopics | Robert C. Williamson

f-GANs in an Information Geometric Nutshell

Richard Nock, Zac Cranko, Aditya K. Menon, Lizhen Qu, Robert C. Williamson

Neural Information Processing SystemsMay-27-2025, 23:59:23 GMT

Nowozin et al showed last year how to extend the GAN principle to all f-divergences. The approach is elegant but falls short of a full description of the supervised game, and says little about the key player, the generator: for example, what does the generator actually converge to if solving the GAN game means convergence in some space of parameters? How does that provide hints on the generator's design and compare to the flourishing but almost exclusively experimental literature on the subject? In this paper, we unveil a broad class of distributions for which such convergence happens -- namely, deformed exponential families, a wide superset of exponential families --. We show that current deep architectures are able to factorize a very large number of such densities using an especially compact design, hence displaying the power of deep architectures and their concinnity in the f-GAN game. This result holds given a sufficient condition on activation functions -- which turns out to be satisfied by popular choices. The key to our results is a variational generalization of an old theorem that relates the KL divergence between regular exponential families and divergences between their natural parameters. We complete this picture with additional results and experimental insights on how these results may be used to ground further improvements of GAN architectures, via (i) a principled design of the activation functions in the generator and (ii) an explicit integration of proper composite losses' link function in the discriminator.

artificial intelligence, exponential family, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)
Asia > Middle East (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Constant Regret, Generalized Mixability, and Mirror Descent

Zakaria Mhammedi, Robert C. Williamson

Neural Information Processing SystemsMay-26-2025, 09:09:28 GMT

We consider the setting of prediction with expert advice; a learner makes predictions by aggregating those of a group of experts. Under this setting, and for the right choice of loss function and "mixing" algorithm, it is possible for the learner to achieve a constant regret regardless of the number of prediction rounds. For example, a constant regret can be achieved for mixable losses using the aggregating algorithm. The Generalized Aggregating Algorithm (GAA) is a name for a family of algorithms parameterized by convex functions on simplices (entropies), which reduce to the aggregating algorithm when using the Shannon entropy S. For a given entropy Φ, losses for which a constant regret is possible using the GAA are called Φ-mixable. Which losses are Φ-mixable was previously left as an open question. We fully characterize Φ-mixability and answer other open questions posed by [6]. We show that the Shannon entropy S is fundamental in nature when it comes to mixability; any Φ-mixable loss is necessarily S-mixable, and the lowest worst-case regret of the GAA is achieved using the Shannon entropy. Finally, by leveraging the connection between the mirror descent algorithm and the update step of the GAA, we suggest a new adaptive generalized aggregating algorithm and analyze its performance in terms of the regret bound.

artificial intelligence, machine learning, prediction, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

From Stochastic Mixability to Fast Rates

Nishant A. Mehta, Robert C. Williamson

Neural Information Processing SystemsMay-25-2025, 21:49:22 GMT

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss l. In the parametric setting, depending upon (l, F, P) ERM can have slow (1/ n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n. There exist several results that give sufficient conditions for fast rates in terms of joint properties of l, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss l (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (l, F, P), and in so doing provides new insight into the fast-rates phenomenon.

artificial intelligence, machine learning, stochastic mixability, (17 more...)

Neural Information Processing Systems

Country: Oceania > Australia (0.14)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Add feedback

Constant Regret, Generalized Mixability, and Mirror Descent

Zakaria Mhammedi, Robert C. Williamson

Neural Information Processing SystemsMar-26-2025, 20:09:07 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, prediction, (17 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

f-GANs in an Information Geometric Nutshell

Richard Nock, Zac Cranko, Aditya K. Menon, Lizhen Qu, Robert C. Williamson

Neural Information Processing SystemsOct-3-2024, 01:54:15 GMT

Nowozin et al showed last year how to extend the GAN principle to all f-divergences. The approach is elegant but falls short of a full description of the supervised game, and says little about the key player, the generator: for example, what does the generator actually converge to if solving the GAN game means convergence in some space of parameters? How does that provide hints on the generator's design and compare to the flourishing but almost exclusively experimental literature on the subject? In this paper, we unveil a broad class of distributions for which such convergence happens -- namely, deformed exponential families, a wide superset of exponential families --. We show that current deep architectures are able to factorize a very large number of such densities using an especially compact design, hence displaying the power of deep architectures and their concinnity in the f-GAN game. This result holds given a sufficient condition on activation functions -- which turns out to be satisfied by popular choices. The key to our results is a variational generalization of an old theorem that relates the KL divergence between regular exponential families and divergences between their natural parameters. We complete this picture with additional results and experimental insights on how these results may be used to ground further improvements of GAN architectures, via (i) a principled design of the activation functions in the generator and (ii) an explicit integration of proper composite losses' link function in the discriminator.

artificial intelligence, exponential family, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback