Moreau-Yosida $f$-divergences

Terjék, Dávid

arXiv.org Machine Learning 

Another is the family of optimal transport central to many machine learning algorithms, with distances (Villani, 2008), including the Wasserstein-1 metric. Lipschitz constrained variants recently gaining In general, variational representations are supremums attention. Inspired by this, we generalize the of integral formulas taken over sets of functions, such as the so-called tight variational representation of f-Donsker-Varadhan formula (Donsker & Varadhan, 1976) divergences in the case of probability measures for the Kullback-Leibler divergence or the Kantorovich-on compact metric spaces to be taken over the Rubinstein formula (Villani, 2008) for the Wasserstein-1 space of Lipschitz functions vanishing at an arbitrary metric. Informally speaking, one can implement (Nowozin base point, characterize functions achieving et al., 2016; Arjovsky et al., 2017) such a formula by constructing the supremum in the variational representation, a real-valued neural network taking samples from propose a practical algorithm to calculate the the two probability measures as inputs, which is then trained tight convex conjugate of f-divergences compatible to maximize the integral formula in order to approximate with automatic differentiation frameworks, the supremum, resulting in a learned proxy to the actual define the Moreau-Yosida approximation of f-divergence of said probability measures. Implementing the divergences with respect to the Wasserstein-1 metric, Kantorovich-Rubinstein formula in such a way involves and derive the corresponding variational formulas, restricting the Lipschitz constant of the neural network (Gulrajani providing a generalization of a number et al., 2017; Petzka et al., 2018; Miyato et al., 2018), of recent results, novel special cases of interest which effectively stabilizes the approximation procedure.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found