Nyquist, Pierre
A weak convergence approach to large deviations for stochastic approximations
Hult, Henrik, Lindhe, Adam, Nyquist, Pierre, Wu, Guo-Jhen
Stochastic approximation (SA) algorithms, first introduced by Robbins and Monro in the 1950's [24], has become one of the most important classes of stochastic numerical methods. Originally aimed at finding the the root of a continuous function given noisy observations, SA is now a fundamental tool in a range of areas such as statistics, optimization, electrical engineering, and machine learning, to mention but a few. Within the latter, the importance of SA algorithms is illustrated by the fact that a specific subclass of methods--stochastic gradient descent (SGD) methods--is central to the training of deep learning methods, and in reinforcement learning the standard methods (Q-learning and temporal-difference-learning) are variants of SA. The general class that is SA algorithms with state-dependent noise (see below for the definition) therefore constitute a rich and important family of stochastic recursive algorithms. In addition to the examples already mentioned (SGD, reinforcement learning), this class also includes persistent contrastive divergence, adaptive Markov chain Monte-Carlo (MCMC) and extended ensemble algorithms such as the Wang-Landau algorithm. The theory of SA stems from the pioneering work of Robbins and Monro [24] and Kiefer and Wolfowitz [20], and remains an active research area within probability theory. This is in part due to the many and diverse applications of SA algorithms where, due to the complex nature of the systems under considerations, different variants of the original Robbins-Monro algorithm are needed. In turn, developing the theoretical foundation for SA algorithms, such as, e.g., convergence results, central limit theorems, concentration results and results on deviations, is of fundamental importance; monographs covering many of the standard results of
REMEDI: Corrective Transformations for Improved Neural Entropy Estimation
Nilsson, Viktor, Samaddar, Anirban, Madireddy, Sandeep, Nyquist, Pierre
Information theoretic quantities play a central role in machine learning. The recent surge in the complexity of data and models has increased the demand for accurate estimation of these quantities. However, as the dimension grows the estimation presents significant challenges, with existing methods struggling already in relatively low dimensions. To address this issue, in this work, we introduce $\texttt{REMEDI}$ for efficient and accurate estimation of differential entropy, a fundamental information theoretic quantity. The approach combines the minimization of the cross-entropy for simple, adaptive base models and the estimation of their deviation, in terms of the relative entropy, from the data density. Our approach demonstrates improvement across a broad spectrum of estimation tasks, encompassing entropy estimation on both synthetic and natural data. Further, we extend important theoretical consistency results to a more generalized setting required by our approach. We illustrate how the framework can be naturally extended to information theoretic supervised learning models, with a specific focus on the Information Bottleneck approach. It is demonstrated that the method delivers better accuracy compared to the existing methods in Information Bottleneck. In addition, we explore a natural connection between $\texttt{REMEDI}$ and generative modeling using rejection sampling and Langevin dynamics.
A note on large deviations for interacting particle dynamics for finding mixed equilibria in zero-sum games
Nilsson, Viktor, Nyquist, Pierre
Finding equilibria points in continuous minimax games has become a key problem within machine learning, in part due to its connection to the training of generative adversarial networks. Because of existence and robustness issues, recent developments have shifted from pure equilibria to focusing on mixed equilibria points. In this note we consider a method proposed by Domingo-Enrich et al. for finding mixed equilibria in two-layer zero-sum games. The method is based on entropic regularisation and the two competing strategies are represented by two sets of interacting particles. We show that the sequence of empirical measures of the particle system satisfies a large deviation principle as the number of particles grows to infinity, and how this implies convergence of the empirical measure and the associated Nikaid\^o-Isoda error, complementing existing law of large numbers results.