Lee, Daniel D.
Belief Flows of Robust Online Learning
Ortega, Pedro A., Crammer, Koby, Lee, Daniel D.
This paper introduces a new probabilistic model for online learning which dynamically incorporates information from stochastic gradients of an arbitrary loss function. Similar to probabilistic filtering, the model maintains a Gaussian belief over the optimal weight parameters. Unlike traditional Bayesian updates, the model incorporates a small number of gradient evaluations at locations chosen using Thompson sampling, making it computationally tractable. The belief is then transformed via a linear flow field which optimally updates the belief distribution using rules derived from information theoretic principles. Several versions of the algorithm are shown using different constraints on the flow field and compared with conventional online learning algorithms. Results are given for several classification tasks including logistic regression and multilayer neural networks.
An Adversarial Interpretation of Information-Theoretic Bounded Rationality
Ortega, Pedro A. (University of Pennsylvania) | Lee, Daniel D. (University of Pennsylvania)
Recently, there has been a growing interest in modeling planning with information constraints. Accordingly, an agent maximizes a regularized expected utility known as the free energy, where the regularizer is given by the information divergence from a prior to a posterior policy. While this approach can be justified in various ways, including from statistical mechanics and information theory, it is still unclear how it relates to decision-making against adversarial environments. This connection has previously been suggested in work relating the free energy to risk-sensitive control and to extensive form games. Here, we show that a single-agent free energy optimization is equivalent to a game between the agent and an imaginary adversary. The adversary can, by paying an exponential penalty, generate costs that diminish the decision maker's payoffs. It turns out that the optimal strategy of the adversary consists in choosing costs so as to render the decision maker indifferent among its choices, which is a definining property of a Nash equilibrium, thus tightening the connection between free energy optimization and game theory.
An Adversarial Interpretation of Information-Theoretic Bounded Rationality
Ortega, Pedro A., Lee, Daniel D.
Recently, there has been a growing interest in modeling planning with information constraints. Accordingly, an agent maximizes a regularized expected utility known as the free energy, where the regularizer is given by the information divergence from a prior to a posterior policy. While this approach can be justified in various ways, including from statistical mechanics and information theory, it is still unclear how it relates to decision-making against adversarial environments. This connection has previously been suggested in work relating the free energy to risk-sensitive control and to extensive form games. Here, we show that a single-agent free energy optimization is equivalent to a game between the agent and an imaginary adversary. The adversary can, by paying an exponential penalty, generate costs that diminish the decision maker's payoffs. It turns out that the optimal strategy of the adversary consists in choosing costs so as to render the decision maker indifferent among its choices, which is a definining property of a Nash equilibrium, thus tightening the connection between free energy optimization and game theory.
Optimal Neural Population Codes for High-dimensional Stimulus Variables
Wang, Zhuo, Stocker, Alan A., Lee, Daniel D.
How does neural population process sensory information? Optimal coding theories assume that neural tuning curves are adapted to the prior distribution of the stimulus variable. Most of the previous work has discussed optimal solutions for only one-dimensional stimulus variables. Here, we expand some of these ideas and present new solutions that define optimal tuning curves for high-dimensional stimulus variables. We consider solutions for a minimal case where the number of neurons in the population is equal to the number of stimulus dimensions (diffeomorphic). In the case of two-dimensional stimulus variables, we analytically derive optimal solutions for different optimal criteria such as minimal L2 reconstruction error or maximal mutual information. For higher dimensional case, the learning rule to improve the population code is provided.
Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification
Noh, Yung-kyun, Park, Frank, Lee, Daniel D.
We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in k-nearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria.
Learning Discriminative Metrics via Generative Models and Kernel Learning
Shi, Yuan, Noh, Yung-Kyun, Sha, Fei, Lee, Daniel D.
Metrics specifying distances between data points can be learned in a discriminative manner or from generative models. In this paper, we show how to unify generative and discriminative learning of metrics via a kernel learning framework. Specifically, we learn local metrics optimized from parametric generative models. These are then used as base kernels to construct a global kernel that minimizes a discriminative training criterion. We consider both linear and nonlinear combinations of local metric kernels. Our empirical results show that these combinations significantly improve performance on classification tasks. The proposed learning algorithm is also very efficient, achieving order of magnitude speedup in training time compared to previous discriminative baseline methods.
Learning Dimensional Descent for Optimal Motion Planning in High-dimensional Spaces
Vernaza, Paul (University of Pennsylvania) | Lee, Daniel D. (University of Pennsylvania)
We present a novel learning-based method for generating optimal motion plans for high-dimensional motion planning problems. In order to cope with the curse of dimensional- ity, our method proceeds in a fashion similar to block co- ordinate descent in finite-dimensional optimization: at each iteration, the motion is optimized over a lower dimensional subspace while leaving the path fixed along the other dimen- sions. Naive implementations of such an idea can produce vastly suboptimal results. In this work, we show how a prof- itable set of directions in which to perform this dimensional descent procedure can be learned efficiently. We provide suf- ficient conditions for global optimality of dimensional de- scent in this learned basis, based upon the low-dimensional structure of the planning cost function. We also show how this dimensional descent procedure can easily be used for problems that do not exhibit such structure with monotonic convergence. We illustrate the application of our method to high dimensional shape planning and arm trajectory planning problems.
Generative Local Metric Learning for Nearest Neighbor Classification
Noh, Yung-kyun, Zhang, Byoung-tak, Lee, Daniel D.
We consider the problem of learning a local metric to enhance the performance of nearest neighbor classification. Conventional metric learning methods attempt to separate data distributions in a purely discriminative manner; here we show how to take advantage of information from parametric generative models. We focus on the bias in the information-theoretic error arising from finite sampling effects, and find an appropriate local metric that maximally reduces the bias based upon knowledge from generative models. As a byproduct, the asymptotic theoretical analysis in this work relates metric learning with dimensionality reduction, which was not understood from previous discriminative approaches. Empirical experiments show that this learned local metric enhances the discriminative nearest neighbor performance on various datasets using simple class conditional generative models.
Learning via Gaussian Herding
Crammer, Koby, Lee, Daniel D.
We introduce a new family of online learning algorithms based upon constraining the velocity flow over a distribution of weight vectors. In particular, we show how to effectively herd a Gaussian weight vector distribution by trading off velocity constraints with a loss function. By uniformly bounding this loss function, we demonstrate how to solve the resulting optimization analytically. We compare the resulting algorithms on a variety of real world datasets, and demonstrate how these algorithms achieve state-of-the-art robust performance, especially with high label noise in the training data.