Not enough data to create a plot.
Try a different view from the menu above.
Courty, Nicolas
Turning Normalizing Flows into Monge Maps with Geodesic Gaussian Preserving Flows
Morel, Guillaume, Drumetz, Lucas, Benaïchouche, Simon, Courty, Nicolas, Rousseau, François
Normalizing Flows (NF) are powerful likelihood-based generative models that are able to trade off between expressivity and tractability to model complex densities. A now well established research avenue leverages optimal transport (OT) and looks for Monge maps, i.e. models with minimal effort between the source and target distributions. This paper introduces a method based on Brenier's polar factorization theorem to transform any trained NF into a more OT-efficient version without changing the final density. We do so by learning a rearrangement of the source (Gaussian) distribution that minimizes the OT cost between the source and the final density. The Gaussian preserving transformation is implemented with the construction of high dimensional divergence free functions and the path leading to the estimated Monge map is further constrained to lie on a geodesic in the space of volume-preserving diffeomorphisms thanks to Euler's equations. The proposed method leads to smooth flows with reduced OT costs for several existing models without affecting the model performance. The code is available here https://github.com/morel-g/GPFlow.
Unbalanced CO-Optimal Transport
Tran, Quang Huy, Janati, Hicham, Courty, Nicolas, Flamary, Rémi, Redko, Ievgen, Demetci, Pinar, Singh, Ritambhara
Optimal transport (OT) compares probability distributions by computing a meaningful alignment between their samples. CO-optimal transport (COOT) takes this comparison further by inferring an alignment between features as well. While this approach leads to better alignments and generalizes both OT and Gromov-Wasserstein distances, we provide a theoretical result showing that it is sensitive to outliers that are omnipresent in real-world data. This prompts us to propose unbalanced COOT for which we provably show its robustness to noise in the compared datasets. To the best of our knowledge, this is the first such result for OT methods in incomparable spaces. With this result in hand, we provide empirical evidence of this robustness for the challenging tasks of heterogeneous domain adaptation with and without varying proportions of classes and simultaneous alignment of samples and features across single-cell measurements.
Spherical Sliced-Wasserstein
Bonet, Clément, Berg, Paul, Courty, Nicolas, Septier, François, Drumetz, Lucas, Pham, Minh-Tan
Many variants of the Wasserstein distance have been introduced to reduce its original computational burden. In particular the Sliced-Wasserstein distance (SW), which leverages one-dimensional projections for which a closed-form solution of the Wasserstein distance is available, has received a lot of interest. Yet, it is restricted to data living in Euclidean spaces, while the Wasserstein distance has been studied and used recently on manifolds. We focus more specifically on the sphere, for which we define a novel SW discrepancy, which we call spherical Sliced-Wasserstein, making a first step towards defining SW discrepancies on manifolds. Our construction is notably based on closed-form solutions of the Wasserstein distance on the circle, together with a new spherical Radon transform. Along with efficient algorithms and the corresponding implementations, we illustrate its properties in several machine learning use cases where spherical representations of data are at stake: sampling on the sphere, density estimation on real earth data or hyperspherical auto-encoders.
Metric Learning-enhanced Optimal Transport for Biochemical Regression Domain Adaptation
Wu, Fang, Courty, Nicolas, Qiang, Zhang, Cui, jiyu, Li, Ziqing
Generalizing knowledge beyond source domains is a crucial prerequisite for many biomedical applications such as drug design and molecular property prediction. To meet this challenge, researchers have used optimal transport (OT) to perform representation alignment between the source and target domains. Yet existing OT algorithms are mainly designed for classification tasks. Accordingly, we consider regression tasks in the unsupervised and semi-supervised settings in this paper. To exploit continuous labels, we propose novel metrics to measure domain distances and introduce a posterior variance regularizer on the transport plan. Further, while computationally appealing, OT suffers from ambiguous decision boundaries and biased local data distributions brought by the mini-batch training. To address those issues, we propose to couple OT with metric learning to yield more robust boundaries and reduce bias. Specifically, we present a dynamic hierarchical triplet loss to describe the global data distribution, where the cluster centroids are progressively adjusted among consecutive iterations. We evaluate our method on both unsupervised and semi-supervised learning tasks in biochemistry. Experiments show the proposed method significantly outperforms state-of-the-art baselines across various benchmark datasets of small molecules and material crystals.
Subspace Detours Meet Gromov-Wasserstein
Bonet, Clément, Courty, Nicolas, Septier, François, Drumetz, Lucas
In the context of optimal transport methods, the subspace detour approach was recently presented by Muzellec and Cuturi (2019). It consists in building a nearly optimal transport plan in the measures space from an optimal transport plan in a wisely chosen subspace, onto which the original measures are projected. The contribution of this paper is to extend this category of methods to the Gromov-Wasserstein problem, which is a particular type of transport distance involving the inner geometry of the compared distributions. After deriving the associated formalism and properties, we also discuss a specific cost for which we can show connections with the Knothe-Rosenblatt rearrangement. We finally give an experimental illustration on a shape matching problem.
Unbalanced minibatch Optimal Transport; applications to Domain Adaptation
Fatras, Kilian, Séjourné, Thibault, Courty, Nicolas, Flamary, Rémi
Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, {\em i.e.} minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.
Learning to Generate Wasserstein Barycenters
Lacombe, Julien, Digne, Julie, Courty, Nicolas, Bonneel, Nicolas
Optimal transport is a notoriously difficult problem to solve numerically, with current approaches often remaining intractable for very large scale applications such as those encountered in machine learning. Wasserstein barycenters -- the problem of finding measures in-between given input measures in the optimal transport sense -- is even more computationally demanding as it requires to solve an optimization problem involving optimal transport distances. By training a deep convolutional neural network, we improve by a factor of 60 the computational speed of Wasserstein barycenters over the fastest state-of-the-art approach on the GPU, resulting in milliseconds computational times on $512\times512$ regular grids. We show that our network, trained on Wasserstein barycenters of pairs of measures, generalizes well to the problem of finding Wasserstein barycenters of more than two measures. We demonstrate the efficiency of our approach for computing barycenters of sketches and transferring colors between multiple images.
Minibatch optimal transport distances; analysis and applications
Fatras, Kilian, Zine, Younes, Majewski, Szymon, Flamary, Rémi, Gribonval, Rémi, Courty, Nicolas
Optimal transport distances have become a classic tool to compare probability distributions and have found many applications in machine learning. Yet, despite recent algorithmic developments, their complexity prevents their direct use on large scale datasets. To overcome this challenge, a common workaround is to compute these distances on minibatches i.e. to average the outcome of several smaller optimal transport problems. We propose in this paper an extended analysis of this practice, which effects were previously studied in restricted cases. We first consider a large variety of Optimal Transport kernels. We notably argue that the minibatch strategy comes with appealing properties such as unbiased estimators, gradients and a concentration bound around the expectation, but also with limits: the minibatch OT is not a distance. To recover some of the lost distance axioms, we introduce a debiased minibatch OT function and study its statistical and optimisation properties. Along with this theoretical analysis, we also conduct empirical experiments on gradient flows, generative adversarial networks (GANs) or color transfer that highlight the practical interest of this strategy.
Contextual Semantic Interpretability
Marcos, Diego, Fong, Ruth, Lobry, Sylvain, Flamary, Remi, Courty, Nicolas, Tuia, Devis
Convolutional neural networks (CNN) are known to learn an image representation that captures concepts relevant to the task, but do so in an implicit way that hampers model interpretability. However, one could argue that such a representation is hidden in the neurons and can be made explicit by teaching the model to recognize semantically interpretable attributes that are present in the scene. We call such an intermediate layer a \emph{semantic bottleneck}. Once the attributes are learned, they can be re-combined to reach the final decision and provide both an accurate prediction and an explicit reasoning behind the CNN decision. In this paper, we look into semantic bottlenecks that capture context: we want attributes to be in groups of a few meaningful elements and participate jointly to the final decision. We use a two-layer semantic bottleneck that gathers attributes into interpretable, sparse groups, allowing them contribute differently to the final output depending on the context. We test our contextual semantic interpretable bottleneck (CSIB) on the task of landscape scenicness estimation and train the semantic interpretable bottleneck using an auxiliary database (SUN Attributes). Our model yields in predictions as accurate as a non-interpretable baseline when applied to a real-world test set of Flickr images, all while providing clear and interpretable explanations for each prediction.
Representation Transfer by Optimal Transport
Li, Xuhong, Grandvalet, Yves, Flamary, Rémi, Courty, Nicolas, Dou, Dejing
Deep learning currently provides the best representations of complex objects for a wide variety of tasks. However, learning these representations is an expensive process that requires very large training samples and significant computing resources. Thankfully, sharing these representations is a common practice, enabling to solve new tasks with relatively little training data and few computing resources; the transfer of representations is nowadays an essential ingredient in numerous real-world applications of deep learning. Transferring representations commonly relies on the parameterized form of the features making up the representation, as encoded by the computational graph of these features. In this paper, we propose to use a novel non-parametric metric between representations. It is based on a functional view of features, and takes into account certain invariances of representations, such as the permutation of their features, by relying on optimal transport. This distance is used as a regularization term promoting similarity between two representations. We show the relevance of this approach in two representation transfer settings, where the representation of a trained reference model is transferred to another one, for solving a new related task (inductive transfer learning), or for distilling knowledge to a simpler model (model compression).