Goto

Collaborating Authors

 Bayesian Inference



Functional Variational Inference based on Stochastic Process Generators

Neural Information Processing Systems

Bayesian inference in the space of functions has been an important topic for Bayesian modeling in the past. In this paper, we propose a new solution to this problem called Functional V ariational Inference (FVI). In FVI, we minimize a divergence in function space between the variational distribution and the posterior process.


Appendices A Further Related Works

Neural Information Processing Systems

ListNet for instance considers the predicted scores as parameters for the Plackett-Luce distribution [39, 40] and learns these scores via maximum likelihood estimation. Used in a PiRank surrogate loss of Section 3.1, the relaxation presented in Section 3.2 recovers the This finishes the proof by induction. Taking j = d, we obtain from Eq. 22 and the nature of permutation matrices that lim C14, we use "Set 1" which is the larger of the two provided For both datasets, we use the standard train/validation/test splits. The experiments were run on a server with 4 8-core Intel Xeon E5-2620v4 CPUs, 128 GB of RAM and 4 NVIDIA Telsa K80 GPUs. TensorFlow Ranking is licensed under the Apache License 2.0 MSLR-WEB30K is licensed under the Microsoft Research License Agreement (MSR-LA).





Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics

Neural Information Processing Systems

We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence. This potentially makes it difficult to directly deal with finite width network; especially in the neural tangent kernel regime, we cannot reveal favorable properties of neural networks beyond kernel methods. To realize more natural analysis, we consider a completely different approach in which we formulate the parameter training as a transportation map estimation and show its global convergence via the theory of the infinite dimensional Langevin dynamics . This enables us to analyze narrow and wide networks in a unifying manner. Moreover, we give generalization gap and excess risk bounds for the solution obtained by the dynamics. The excess risk bound achieves the so-called fast learning rate. In particular, we show an exponential convergence for a classification problem and a minimax optimal rate for a regression problem.