Goto

Collaborating Authors

 Vesseron, Nina


Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures

arXiv.org Machine Learning

A common approach to generative modeling is to split model-fitting into two blocks: define first how to sample noise (e.g. Gaussian) and choose next what to do with it (e.g. using a single map or flows). We explore in this work an alternative route that ties sampling and mapping. We find inspiration in moment measures, a result that states that for any measure $\rho$ supported on a compact convex set of $\mathbb{R}^d$, there exists a unique convex potential $u$ such that $\rho=\nabla u\,\sharp\,e^{-u}$. While this does seem to tie effectively sampling (from log-concave distribution $e^{-u}$) and action (pushing particles through $\nabla u$), we observe on simple examples (e.g., Gaussians or 1D distributions) that this choice is ill-suited for practical tasks. We study an alternative factorization, where $\rho$ is factorized as $\nabla w^*\,\sharp\,e^{-w}$, where $w^*$ is the convex conjugate of $w$. We call this approach conjugate moment measures, and show far more intuitive results on these examples. Because $\nabla w^*$ is the Monge map between the log-concave distribution $e^{-w}$ and $\rho$, we rely on optimal transport solvers to propose an algorithm to recover $w$ from samples of $\rho$, and parameterize $w$ as an input-convex neural network.


On a Neural Implementation of Brenier's Polar Factorization

arXiv.org Machine Learning

In 1991, Brenier proved a theorem that generalizes the $QR$ decomposition for square matrices -- factored as PSD $\times$ unitary -- to any vector field $F:\mathbb{R}^d\rightarrow \mathbb{R}^d$. The theorem, known as the polar factorization theorem, states that any field $F$ can be recovered as the composition of the gradient of a convex function $u$ with a measure-preserving map $M$, namely $F=\nabla u \circ M$. We propose a practical implementation of this far-reaching theoretical result, and explore possible uses within machine learning. The theorem is closely related to optimal transport (OT) theory, and we borrow from recent advances in the field of neural optimal transport to parameterize the potential $u$ as an input convex neural network. The map $M$ can be either evaluated pointwise using $u^*$, the convex conjugate of $u$, through the identity $M=\nabla u^* \circ F$, or learned as an auxiliary network. Because $M$ is, in general, not injective, we consider the additional task of estimating the ill-posed inverse map that can approximate the pre-image measure $M^{-1}$ using a stochastic generator. We illustrate possible applications of \citeauthor{Brenier1991PolarFA}'s polar factorization to non-convex optimization problems, as well as sampling of densities that are not log-concave.


Deep Neural Networks Are Congestion Games: From Loss Landscape to Wardrop Equilibrium and Beyond

arXiv.org Machine Learning

Since the very seeding of the machine learning (ML) field, the ML researchers has constantly drawn inspiration from other areas of science both to develop novel approaches and to better understand the existing ones. One such notable example is a longstanding fruitful relationship of ML with game theory (GT) that manifested itself by the novel insights regarding the analysis of such different learning settings as reinforcement learning [Peshkin et al., 2000, Hu and Wellman, 2003, Claus and Boutilier, 1998], boosting [Freund and Schapire, 1996] and adversarial classification [Liu and Chawla, 2009, Brückner and Scheffer, 2011, Dritsoula et al., 2017] to name a few. While the interplay between ML and GT in the above-mentioned cases is natural, ie, reinforcement learning is a game played between the agent and the environment, boosting is a repeated game with rewards and adversarial learning can be seen as a traditional minimax game, very few works studied the connection between the deep neural networks (DNNs) and GT despite the omnipresence of the former in the ML field. Indeed, in recent years, deep learning has imposed itself as the state of the art ML method in many real-world tasks, such as computer vision or natural language processing to name a few [Goodfellow et al., 2016]. While achieving impressive performance in practice, training DNNs requires optimizing a non-convex non-concave objective function even in the case of linear activation functions and can potentially lead to local minima that are arbitrary far from global minimum. This, however, is not the typical behaviour observed in practice, as several works [Dauphin et al., 2014, Goodfellow and Vinyals, 2015] showed empirically that even in the case of training the state-of-the-art convolutional or fully-connected feedforward neural networks one does not converge to suboptimal local minima. Such a mysterious behaviour made studying the loss surface of DNNs and characterizing their local minima one of the topics of high scientific importance for the ML community. In this paper, we propose a novel approach for analyzing DNNs' behaviour by modelling them as congestion games, a popular class of games first studied by [Rosenthal, 1973] in the context of traffic routing.