Country
Optimization on a Budget: A Reinforcement Learning Approach
Ruvolo, Paul L., Fasel, Ian, Movellan, Javier R.
Many popular optimization algorithms, like the Levenberg-Marquardt algorithm (LMA), use heuristic-based controllers'' that modulate the behavior of the optimizer during the optimization process. For example, in the LMA a damping parameter is dynamically modified based on a set rules that were developed using various heuristic arguments. Reinforcement learning (RL) is a machine learning approach to learn optimal controllers by examples and thus is an obvious candidate to improve the heuristic-based controllers implicit in the most popular and heavily used optimization algorithms. Improving the performance of off-the-shelf optimizers is particularly important for time-constrained optimization problems. For example the LMA algorithm has become popular for many real-time computer vision problems, including object tracking from video, where only a small amount of time can be allocated to the optimizer on each incoming video frame. Here we show that a popular modern reinforcement learning technique using a very simply state space can dramatically improve the performance of general purpose optimizers, like the LMA. Most surprisingly the controllers learned for a particular domain appear to work very well also on very different optimization domains. For example we used RL methods to train a new controller for the damping parameter of the LMA. This controller was trained on a collection of classic, relatively small, non-linear regression problems. The modified LMA performed better than the standard LMA on these problems. Most surprisingly, it also dramatically outperformed the standard LMA on a difficult large scale computer vision problem for which it had not been trained before. Thus the controller appeared to have extracted control rules that were not just domain specific but generalized across a wide range of optimization domains."
The Mondrian Process
We describe a novel class of distributions, called Mondrian processes, which can be interpreted as probability distributions over kd-tree data structures. Mondrian processesare multidimensional generalizations of Poisson processes and this connection allows us to construct multidimensional generalizations of the stickbreaking processdescribed by Sethuraman (1994), recovering the Dirichlet process in one dimension. After introducing the Aldous-Hoover representation for jointly and separately exchangeable arrays, we show how the process can be used as a nonparametric prior distribution in Bayesian models of relational data.
Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms
Roberts, John W., Tedrake, Russ
Policy gradient (PG) reinforcement learning algorithms have strong (local) convergence guarantees, but their learning performance is typically limited by a large variance in the estimate of the gradient. In this paper, we formulate the variance reduction problem by describing a signal-to-noise ratio (SNR) for policy gradient algorithms, and evaluate this SNR carefully for the popular Weight Perturbation (WP) algorithm. We confirm that SNR is a good predictor of long-term learning performance, and that in our episodic formulation, the cost-to-go function is indeed the optimal baseline. We then propose two modifications to traditional model-free policy gradient algorithms in order to optimize the SNR. First, we examine WP using anisotropic sampling distributions, which introduces a bias into the update but increases the SNR; this bias can be interpretted as following the natural gradient of the cost function. Second, we show that non-Gaussian distributions can also increase the SNR, and argue that the optimal isotropic distribution is a âshellâ distribution with a constant magnitude and uniform distribution in direction. We demonstrate that both modifications produce substantial improvements in learning performance in challenging policy gradient experiments.
Temporal Dynamics of Cognitive Control
Reynolds, Jeremy, Mozer, Michael C.
Cognitive control refers to the flexible deployment of memory and attention in response to task demands and current goals. Control is often studied experimentally by presenting sequences of stimuli, some demanding a response, and others modulating the stimulus-response mapping. In these tasks, participants must maintain information about the current stimulus-response mapping in working memory. Prominent theories of cognitive control use recurrent neural nets to implement working memory, and optimize memory utilization via reinforcement learning. We present a novel perspective on cognitive control in which working memory representations are intrinsically probabilistic, and control operations that maintain and update working memory are dynamically determined via probabilistic inference. We show that our model provides a parsimonious account of behavioral and neuroimaging data, and suggest that it offers an elegant conceptualization of control in which behavior can be cast as optimal, subject to limitations on learning and the rate of information processing. Moreover, our model provides insight into how task instructions can be directly translated into appropriate behavior and then efficiently refined with subsequent task experience.
Nonparametric sparse hierarchical models describe V1 fMRI responses to natural images
Vu, Vincent Q., Yu, Bin, Naselaris, Thomas, Kay, Kendrick, Gallant, Jack, Ravikumar, Pradeep K.
We propose a novel hierarchical, nonlinear model that predicts brain activity in area V1 evoked by natural images. In the study reported here brain activity was measured by means of functional magnetic resonance imaging (fMRI), a noninvasive technique that provides an indirect measure of neural activity pooled over a small volume (~ 2mm cube) of brain tissue. Our model, which we call the SpAM V1 model, is based on the reasonable assumption that fMRI measurements reflect the (possibly nonlinearly) pooled, rectified output of a large population of simple and complex cells in V1. It has a hierarchical filtering stage that consists of three layers: model simple cells, model complex cells, and a third layer in which the complex cells are linearly pooled (called âpooled-complexâ cells). The pooling stage then obtains the measured fMRI signals as a sparse additive model (SpAM) in which a sparse nonparametric (nonlinear) combination of model complex cell and model pooled-complex cell outputs are summed. Our results show that the SpAM V1 model predicts fMRI responses evoked by natural images better than a benchmark model that only provides linear pooling of model complex cells. Furthermore, the spatial receptive fields, frequency tuning and orientation tuning curves of the SpAM V1 model estimated for each voxel appears to be consistent with the known properties of V1, and with previous analyses of this data set. A visualization procedure applied to the SpAM V1 model shows that most of the nonlinear pooling consists of simple compressive or saturating nonlinearities.
Near-minimax recursive density estimation on the binary hypercube
Raginsky, Maxim, Lazebnik, Svetlana, Willett, Rebecca, Silva, Jorge
This paper describes a recursive estimation procedure for multivariate binary densities using orthogonal expansions. For $d$ covariates, there are $2^d$ basis coefficients to estimate, which renders conventional approaches computationally prohibitive when $d$ is large. However, for a wide class of densities that satisfy a certain sparsity condition, our estimator runs in probabilistic polynomial time and adapts to the unknown sparsity of the underlying density in two key ways: (1) it attains near-minimax mean-squared error, and (2) the computational complexity is lower for sparser densities. Our method also allows for flexible control of the trade-off between mean-squared error and computational complexity.
Estimation of Information Theoretic Measures for Continuous Random Variables
We analyze the estimation of information theoretic measures of continuous random variables such as: differential entropy, mutual information or Kullback-Leibler divergence. The objective of this paper is two-fold. First, we prove that the information theoretic measure estimates using the k-nearest-neighbor density estimation with fixed k converge almost surely, even though the k-nearest-neighbor density estimation with fixed k does not converge to its true measure. Second, we show that the information theoretic measure estimates do not converge for k growing linearly with the number of samples. Nevertheless, these nonconvergent estimates can be used for solving the two-sample problem and assessing if two random variables are independent. We show that the two-sample and independence tests based on these nonconvergent estimates compare favorably with the maximum mean discrepancy test and the Hilbert Schmidt independence criterion, respectively.
Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets
Pellet, Jean-philippe, Elisseeff, André
Causal structure-discovery techniques usually assume that all causes of more than one variable are observed. This is the so-called causal sufficiency assumption. In practice, it is untestable, and often violated. In this paper, we present an efficient causal structure-learning algorithm, suited for causally insufficient data. Similar to algorithms such as IC* and FCI, the proposed approach drops the causal sufficiency assumption and learns a structure that indicates (potential) latent causes for pairs of observed variables. Assuming a constant local density of the data-generating graph, our algorithm makes a quadratic number of conditional-independence tests w.r.t. the number of variables. We show with experiments that our algorithm is comparable to the state-of-the-art FCI algorithm in accuracy, while being several orders of magnitude faster on large problems. We conclude that MBCS* makes a new range of causally insufficient problems computationally tractable.
Improving on Expectation Propagation
Opper, Manfred, Paquet, Ulrich, Winther, Ole
A series of corrections is developed for the fixed points of Expectation Propagation (EP),which is one of the most popular methods for approximate probabilistic inference. These corrections can lead to improvements of the inference approximation orserve as a sanity check, indicating when EP yields unrealiable results.