Goto

Collaborating Authors

 author provide


Reviews: Discrete Flows: Invertible Generative Models of Discrete Data

Neural Information Processing Systems

Originality: This paper is the first demonstration of flow-based models to discrete data. As such, the work is fairly novel. The flow-based modeling community has been wondering how to model discrete data for some time, and this paper provides an answer to this question. That being said, the main technical contribution amounts to using a modulo operator (Eq. I view this simplicity as a benefit of the approach, but some may view this a simple extension of existing techniques.


Reviews: The Thermodynamic Variational Objective

Neural Information Processing Systems

The paper connects variational inference with thermodynamic integration, so that the data log-likelihood can be formulated as a 1D integration of the instantaneous ELBO in a unit interval. By applying a left Riemann sum, TVO, a novel lower bound for the marginal log likelihood, is derived in which the traditional variational ELBO is recovered when only one partition is used. The authors then design an importance-sampling-based gradient estimator to optimize the objective, and compare with other methods on both discrete and continuous deep generative models. Originality and Significance: the formulation of TVO is an interesting idea. Better optimization methods than the importance-sampling-based approach are worth further exploring.


Reviews: Dual Path Networks

Neural Information Processing Systems

The authors propose a new network architecture which is a combination of ResNets and DenseNets. They introduce a very informative theoretical formulation which can be used to formulate ResNets, DenseNets and their proposed architecture. Pros: () The paper is well written with theoretical and empirical results () The authors provide useful analysis and statistics () The impact of DPNs is shown on a variety of computer vision tasks () The performance of the DPNs on the presented vision tasks is compelling Cons: (-) Optional results on MS COCO would make the paper even stronger Network engineering is an important field and it is important that it is done correctly, with analysis and many in depth experiments. The impact of new architectures comes through their generalization capabilities. This paper does a good job on all of the above.


Reviews: Online Learning for Multivariate Hawkes Processes

Neural Information Processing Systems

This paper describes an algorithm for optizimization of Hawkes process parameters in on-line settings, where non-parametric form of a kernel is learnt. The paper reports a gradient approach to optimization, with theoretical analysis thereof. In particular, the authors provide: a regret bound, justification for simplification steps (discretization of time and truncation of time over which previous posts influence a new post), an approach to a tractable projection of the solution (a step in the algorithm), time complexity analysis. The paper is very well written, which is very helpful given it is mathematically involved. I found it tackling an important problem (on-line learning is important for large scale datasets, and non-parametricity is a very reasonable setting when it is hard to specify a reasonable kernel form a priori).


Reviews: On the Convergence and Robustness of Training GANs with Regularized Optimal Transport

Neural Information Processing Systems

SUMMARY The authors investigate the task of training a Generative Adversarial Networks model based on optimal transport (OT) loss. They focus on regularized OT losses, and show that approximate gradients of these losses can be obtained by approximately solving regularized OT problem (Thm 4.1). As a consequence, a non-convex stochastic gradient method for minimizing this loss has a provable convergence rate to stationarity (Thm 4.2). The analysis also applies to Sinkhorn losses. The authors then explore numerically the behavior of a practical algorithm where the dual variable are parametrized by neural networks (the theory does not immediately apply because estimating the loss gradient becomes non-convex).