Lee, Daniel D.
Nearest neighbor density functional estimation based on inverse Laplace transform
Ganguly, Shouvik, Ryu, Jongha, Kim, Young-Han, Noh, Yung-Kyun, Lee, Daniel D.
A general approach to $L_2$-consistent estimation of various density functionals using $k$-nearest neighbor distances is proposed, along with the analysis of convergence rates in mean squared error. The construction of the estimator is based on inverse Laplace transforms related to the target density functional, which arises naturally from the convergence of a normalized volume of $k$-nearest neighbor ball to a Gamma distribution in the sample limit. Some instantiations of the proposed estimator rediscover existing $k$-nearest neighbor based estimators of Shannon and Renyi entropies and Kullback--Leibler and Renyi divergences, and discover new consistent estimators for many other functionals, such as Jensen--Shannon divergence and generalized entropies and divergences. A unified finite-sample analysis of the proposed estimator is presented that builds on a recent result by Gao, Oh, and Viswanath (2017) on the finite sample behavior of the Kozachenko--Leoneko estimator of entropy.
Scalable Centralized Deep Multi-Agent Reinforcement Learning via Policy Gradients
Khan, Arbaaz, Zhang, Clark, Lee, Daniel D., Kumar, Vijay, Ribeiro, Alejandro
In this paper, we explore using deep reinforcement learning for problems with multiple agents. Most existing methods for deep multi-agent reinforcement learning consider only a small number of agents. When the number of agents increases, the dimensionality of the input and control spaces increase as well, and these methods do not scale well. To address this, we propose casting the multi-agent reinforcement learning problem as a distributed optimization problem. Our algorithm assumes that for multi-agent settings, policies of individual agents in a given population live close to each other in parameter space and can be approximated by a single policy. With this simple assumption, we show our algorithm to be extremely effective for reinforcement learning in multi-agent settings. We demonstrate its effectiveness against existing comparable approaches on co-operative and competitive tasks.
Bayesian Q-learning with Assumed Density Filtering
Jeong, Heejin (University of Pennsylvania) | Lee, Daniel D. (University of Pennsylvania)
While off-policy temporal difference methods have been broadly used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have been relatively understudied. This is mainly because the max operator in the Bellman optimality equation brings non-linearity and inconsistent distributions over value function. In this paper, we introduce a new Bayesian approach to off-policy TD methods using Assumed Density Filtering, called ADFQ, which updates beliefs on action-values (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs not only are used in exploration but they provide a natural regularization in the belief updates. We also present a connection between ADFQ and Q-learning. Our empirical results show the proposed ADFQ algorithms outperform comparing algorithms in several task domains. Moreover, our algorithms improve general drawbacks in BRL such as efficiency, usage of uncertainty, and nonlinearity.
Memory Augmented Control Networks
Khan, Arbaaz, Zhang, Clark, Atanasov, Nikolay, Karydis, Konstantinos, Kumar, Vijay, Lee, Daniel D.
Planning problems in partially observable environments cannot be solved directly with convolutional networks and require some form of memory. But, even memory networks with sophisticated addressing schemes are unable to learn intelligent reasoning satisfactorily due to the complexity of simultaneously learning to access memory and plan. To mitigate these challenges we introduce the Memory Augmented Control Network (MACN). The proposed network architecture consists of three main parts. The first part uses convolutions to extract features and the second part uses a neural network-based planning module to pre-plan in the environment. The third part uses a network controller that learns to store those specific instances of past information that are necessary for planning. The performance of the network is evaluated in discrete grid world environments for path planning in the presence of simple and complex obstacles. We show that our network learns to plan and can generalize to new environments.
Maximizing Activity in Ising Networks via the TAP Approximation
Lynn, Christopher W. (University of Pennsylvania) | Lee, Daniel D. (University of Pennsylvania)
A wide array of complex biological, social, and physical systems have recently been shown to be quantitatively described by Ising models, which lie at the intersection of statistical physics and machine learning. Here, we study the fundamental question of how to optimize the state of a networked Ising system given a budget of external influence. In the continuous setting where one can tune the influence applied to each node, we propose a series of approximate gradient ascent algorithms based on the Plefka expansion, which generalizes the naive mean field and TAP approximations. In the discrete setting where one chooses a small set of influential nodes, the problem is equivalent to the famous influence maximization problem in social networks with an additional stochastic noise term. In this case, we provide sufficient conditions for when the objective is submodular, allowing a greedy algorithm to achieve an approximation ratio of 1-1/e. Additionally, we compare the Ising-based algorithms with traditional influence maximization algorithms, demonstrating the practical importance of accurately modeling stochastic fluctuations in the system.
Generative Local Metric Learning for Kernel Regression
Noh, Yung-Kyun, Sugiyama, Masashi, Kim, Kee-Eung, Park, Frank, Lee, Daniel D.
This paper shows how metric learning can be used with Nadaraya-Watson (NW) kernel regression. Compared with standard approaches, such as bandwidth selection, we show how metric learning can significantly reduce the mean square error (MSE) in kernel regression, particularly for high-dimensional data. We propose a method for efficiently learning a good metric function based upon analyzing the performance of the NW estimator for Gaussian-distributed data. A key feature of our approach is that the NW estimator with a learned metric uses information from both the global and local structure of the training data. Theoretical and empirical results confirm that the learned metric can considerably reduce the bias and MSE for kernel regression even when the data are not confined to Gaussian.
Learning Data Manifolds with a Cutting Plane Method
Chung, SueYeon, Cohen, Uri, Sompolinsky, Haim, Lee, Daniel D.
We consider the problem of classifying data manifolds where each manifold represents invariances that are parameterized by continuous degrees of freedom. Conventional data augmentation methods rely upon sampling large numbers of training examples from these manifolds; instead, we propose an iterative algorithm called M_{CP} based upon a cutting-plane approach that efficiently solves a quadratic semi-infinite programming problem to find the maximum margin solution. We provide a proof of convergence as well as a polynomial bound on the number of iterations required for a desired tolerance in the objective function. The efficiency and performance of M_{CP} are demonstrated in high-dimensional simulations and on image manifolds generated from the ImageNet dataset. Our results indicate that M_{CP} is able to rapidly learn good classifiers and shows superior generalization performance compared with conventional maximum margin methods using data augmentation methods.
Maximizing Influence in an Ising Network: A Mean-Field Optimal Solution
Lynn, Christopher, Lee, Daniel D.
Influence maximization in social networks has typically been studied in the context of contagion models and irreversible processes. In this paper, we consider an alternate model that treats individual opinions as spins in an Ising system at dynamic equilibrium. We formalize the \textit{Ising influence maximization} problem, which has a natural physical interpretation as maximizing the magnetization given a budget of external magnetic field. Under the mean-field (MF) approximation, we present a gradient ascent algorithm that uses the susceptibility to efficiently calculate local maxima of the magnetization, and we develop a number of sufficient conditions for when the MF magnetization is concave and our algorithm converges to a global optimum. We apply our algorithm on random and real-world networks, demonstrating, remarkably, that the MF optimal external fields (i.e., the external fields which maximize the MF magnetization) exhibit a phase transition from focusing on high-degree individuals at high temperatures to focusing on low-degree individuals at low temperatures. We also establish a number of novel results about the structure of steady-states in the ferromagnetic MF Ising model on general graphs, which are of independent interest.
Efficient Neural Codes under Metabolic Constraints
Wang, Zhuo, Wei, Xue-Xin, Stocker, Alan A., Lee, Daniel D.
Neural codes are inevitably shaped by various kinds of biological constraints, \emph{e.g.} noise and metabolic cost. Here we formulate a coding framework which explicitly deals with noise and the metabolic costs associated with the neural representation of information, and analytically derive the optimal neural code for monotonic response functions and arbitrary stimulus distributions. For a single neuron, the theory predicts a family of optimal response functions depending on the metabolic budget and noise characteristics. Interestingly, the well-known histogram equalization solution can be viewed as a special case when metabolic resources are unlimited. For a pair of neurons, our theory suggests that under more severe metabolic constraints, ON-OFF coding is an increasingly more efficient coding scheme compared to ON-ON or OFF-OFF. The advantage could be as large as one-fold, substantially larger than the previous estimation. Some of these predictions could be generalized to the case of large neural populations. In particular, these analytical results may provide a theoretical basis for the predominant segregation into ON- and OFF-cells in early visual processing areas. Overall, we provide a unified framework for optimal neural codes with monotonic tuning curves in the brain, and makes predictions that can be directly tested with physiology experiments.