Goto

Collaborating Authors

 invertible function



Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax

Neural Information Processing Systems

The Gumbel-Softmax is a continuous distribution over the simplex that is often used as a relaxation of discrete distributions. Because it can be readily interpreted and easily reparameterized, it enjoys widespread use. We propose a modular and more flexible family of reparameterizable distributions where Gaussian noise is transformed into a one-hot approximation through an invertible function. This invertible function is composed of a modified softmax and can incorporate diverse transformations that serve different specific purposes. For example, the stick-breaking procedure allows us to extend the reparameterization trick to distributions with countably infinite support, thus enabling the use of our distribution along nonparametric models, or normalizing flows let us increase the flexibility of the distribution. Our construction enjoys theoretical advantages over the Gumbel-Softmax, such as closed form KL, and significantly outperforms it in a variety of experiments.


Reviews: Adaptive Density Estimation for Generative Models

Neural Information Processing Systems

Summary: The authors propose a hybrid method that combines VAEs with adversarial training and flow based models. In particular, they derive an explicit density function p(x) where the likelihood can be evaluated, the corresponding components p(x z) are more flexible than the standard VAE that utilizes diagonal Gaussians, and the generated samples have better quality than a standard VAE. The basic idea of the proposed model is that the VAE is defined between a latent space and an intermediate representation space, and then, the representation space is connected with the data space through an invertible non-linear flow. In general, I think the paper is quite well written, but on the same time I believe that there is a lot of compressed information, and the consequence is that in some parts it is not even clear what the authors want to say (see Clarity comments). The proposed idea of the paper seems quite interesting, but on the same time I have some doubts (see Quality comments).


Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax

Neural Information Processing Systems

The Gumbel-Softmax is a continuous distribution over the simplex that is often used as a relaxation of discrete distributions. Because it can be readily interpreted and easily reparameterized, it enjoys widespread use. We propose a modular and more flexible family of reparameterizable distributions where Gaussian noise is transformed into a one-hot approximation through an invertible function. This invertible function is composed of a modified softmax and can incorporate diverse transformations that serve different specific purposes. For example, the stick-breaking procedure allows us to extend the reparameterization trick to distributions with countably infinite support, thus enabling the use of our distribution along nonparametric models, or normalizing flows let us increase the flexibility of the distribution.


Knowledge Graph Embedding by Normalizing Flows

arXiv.org Artificial Intelligence

A key to knowledge graph embedding (KGE) is to choose a proper representation space, e.g., point-wise Euclidean space and complex vector space. In this paper, we propose a unified perspective of embedding and introduce uncertainty into KGE from the view of group theory. Our model can incorporate existing models (i.e., generality), ensure the computation is tractable (i.e., efficiency) and enjoy the expressive power of complex random variables (i.e., expressiveness). The core idea is that we embed entities/relations as elements of a symmetric group, i.e., permutations of a set. Permutations of different sets can reflect different properties of embedding. And the group operation of symmetric groups is easy to compute. In specific, we show that the embedding of many existing models, point vectors, can be seen as elements of a symmetric group. To reflect uncertainty, we first embed entities/relations as permutations of a set of random variables. A permutation can transform a simple random variable into a complex random variable for greater expressiveness, called a normalizing flow. We then define scoring functions by measuring the similarity of two normalizing flows, namely NFE. We construct several instantiating models and prove that they are able to learn logical rules. Experimental results demonstrate the effectiveness of introducing uncertainty and our model. The code is available at https://github.com/changyi7231/NFE.


Minimax Analysis for Inverse Risk in Nonparametric Planer Invertible Regression

arXiv.org Machine Learning

We study a minimax risk of estimating inverse functions on a plane, while keeping an estimator is also invertible. Learning invertibility from data and exploiting an invertible estimator are used in many domains, such as statistics, econometrics, and machine learning. Although the consistency and universality of invertible estimators have been well investigated, analysis on the efficiency of these methods is still under development. In this study, we study a minimax risk for estimating invertible bi-Lipschitz functions on a square in a $2$-dimensional plane. We first introduce an inverse $L^2$-risk to evaluate an estimator which preserves invertibility. Then, we derive lower and upper rates for a minimax inverse risk by exploiting a representation of invertible functions using level-sets. To obtain an upper bound, we develop an estimator asymptotically almost everywhere invertible, whose risk attains the derived minimax lower rate up to logarithmic factors. The derived minimax rate corresponds to that of the non-invertible bi-Lipschitz function, which rejects the expectation of whether invertibility improves the minimax rate, similar to other shape constraints.


A Formal Approach to Explainability

arXiv.org Machine Learning

We regard explanations as a blending of the input sample and the model's output and offer a few definitions that capture various desired properties of the function that generates t hese explanations. We study the links between these properties a nd between explanation-generating functions and intermedia te representations of learned models and are able to show, for example, that if the activations of a given layer are consist ent with an explanation, then so do all other subsequent layers. In addition, we study the intersection and union of explanatio ns as a way to construct new explanations.