Goto

Collaborating Authors

 Lee, Daniel D.


Learning Continuous Cost-to-Go Functions for Non-holonomic Systems

arXiv.org Artificial Intelligence

This paper presents a supervised learning method to generate continuous cost-to-go functions of non-holonomic systems directly from the workspace description. Supervision from informative examples reduces training time and improves network performance. The manifold representing the optimal trajectories of a non-holonomic system has high-curvature regions which can not be efficiently captured with uniform sampling. To address this challenge, we present an adaptive sampling method which makes use of sampling-based planners along with local, closed-form solutions to generate training samples. The cost-to-go function over a specific workspace is represented as a neural network whose weights are generated by a second, higher order network. The networks are trained in an end-to-end fashion. In our previous work, this architecture was shown to successfully learn to generate the cost-to-go functions of holonomic systems using uniform sampling. In this work, we show that uniform sampling fails for non-holonomic systems. However, with the proposed adaptive sampling methodology, our network can generate near-optimal trajectories for non-holonomic systems while avoiding obstacles. Experiments show that our method is two orders of magnitude faster compared to traditional approaches in cluttered environments.


Cost-to-Go Function Generating Networks for High Dimensional Motion Planning

arXiv.org Artificial Intelligence

This paper presents c2g-HOF networks which learn to generate cost-to-go functions for manipulator motion planning. The c2g-HOF architecture consists of a cost-to-go function over the configuration space represented as a neural network (c2g-network) as well as a Higher Order Function (HOF) network which outputs the weights of the c2g-network for a given input workspace. Both networks are trained end-to-end in a supervised fashion using costs computed from traditional motion planners. Once trained, c2g-HOF can generate a smooth and continuous cost-to-go function directly from workspace sensor inputs (represented as a point cloud in 3D or an image in 2D). At inference time, the weights of the c2g-network are computed very efficiently and near-optimal trajectories are generated by simply following the gradient of the cost-to-go function. We compare c2g-HOF with traditional planning algorithms for various robots and planning scenarios. The experimental results indicate that planning with c2g-HOF is significantly faster than other motion planning algorithms, resulting in orders of magnitude improvement when including collision checking. Furthermore, despite being trained from sparsely sampled trajectories in configuration space, c2g-HOF generalizes to generate smoother, and often lower cost, trajectories. We demonstrate cost-to-go based planning on a 7 DoF manipulator arm where motion planning in a complex workspace requires only 0.13 seconds for the entire trajectory.


Low-Bandwidth Communication Emerges Naturally in Multi-Agent Learning Systems

arXiv.org Artificial Intelligence

In this work, we study emergent communication through the lens of cooperative multi-agent behavior in nature. Using insights from animal communication, we propose a spectrum from low-bandwidth (e.g. pheromone trails) to high-bandwidth (e.g. compositional language) communication that is based on the cognitive, perceptual, and behavioral capabilities of social agents. Through a series of experiments with pursuit-evasion games, we identify multi-agent reinforcement learning algorithms as a computational model for the low-bandwidth end of the communication spectrum.


Learning to Track Dynamic Targets in Partially Known Environments

arXiv.org Machine Learning

We solve active target tracking, one of the essential tasks in autonomous systems, using a deep reinforcement learning (RL) approach. In this problem, an autonomous agent is tasked with acquiring information about targets of interests using its onboard sensors. The classical challenges in this problem are system model dependence and the difficulty of computing information-theoretic cost functions for a long planning horizon. RL provides solutions for these challenges as the length of its effective planning horizon does not affect the computational complexity, and it drops the strong dependency of an algorithm on system models. In particular, we introduce Active Tracking Target Network (ATTN), a unified RL policy that is capable of solving major sub-tasks of active target tracking -- in-sight tracking, navigation, and exploration. The policy shows robust behavior for tracking agile and anomalous targets with a partially known target model. Additionally, the same policy is able to navigate in obstacle environments to reach distant targets as well as explore the environment when targets are positioned in unexpected locations.


Algorithms for Non-negative Matrix Factorization

Neural Information Processing Systems

Non-negative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence. The monotonic convergence of both algorithms can be proven using an auxiliary func- tion analogous to that used for proving convergence of the Expectation- Maximization algorithm. The algorithms can also be interpreted as diag- onally rescaled gradient descent, where the rescaling factor is optimally chosen to ensure convergence.


Generative Local Metric Learning for Nearest Neighbor Classification

Neural Information Processing Systems

We consider the problem of learning a local metric to enhance the performance of nearest neighbor classification. Conventional metric learning methods attempt to separate data distributions in a purely discriminative manner; here we show how to take advantage of information from parametric generative models. We focus on the bias in the information-theoretic error arising from finite sampling effects, and find an appropriate local metric that maximally reduces the bias based upon knowledge from generative models. As a byproduct, the asymptotic theoretical analysis in this work relates metric learning with dimensionality reduction, which was not understood from previous discriminative approaches. Empirical experiments show that this learned local metric enhances the discriminative nearest neighbor performance on various datasets using simple class conditional generative models.


Learning Q-network for Active Information Acquisition

arXiv.org Machine Learning

In this paper, we propose a novel Reinforcement Learning approach for solving the Active Information Acquisition problem, which requires an agent to choose a sequence of actions in order to acquire information about a process of interest using on-board sensors. The classic challenges in the information acquisition problem are the dependence of a planning algorithm on known models and the difficulty of computing information-theoretic cost functions over arbitrary distributions. In contrast, the proposed framework of reinforcement learning does not require any knowledge on models and alleviates the problems during an extended training stage. It results in policies that are efficient to execute online and applicable for real-time control of robotic systems. Furthermore, the state-of-the-art planning methods are typically restricted to short horizons, which may become problematic with local minima. Reinforcement learning naturally handles the issue of planning horizon in information problems as it maximizes a discounted sum of rewards over a long finite or infinite time horizon. We discuss the potential benefits of the proposed framework and compare the performance of the novel algorithm to an existing information acquisition method for multi-target tracking scenarios.


Learning Optimal Resource Allocations in Wireless Systems

arXiv.org Machine Learning

This paper considers the design of optimal resource allocation policies in wireless communication systems which are generically modeled as a functional optimization problem with stochastic constraints. These optimization problems have the structure of a learning problem in which the statistical loss appears as a constraint, motivating the development of learning methodologies to attempt their solution. To handle stochastic constraints, training is undertaken in the dual domain. It is shown that this can be done with small loss of optimality when using near-universal learning parameterizations. In particular, since deep neural networks (DNN) are near-universal their use is advocated and explored. DNNs are trained here with a model-free primal-dual method that simultaneously learns a DNN parametrization of the resource allocation policy and optimizes the primal and dual variables. Numerical simulations demonstrate the strong performance of the proposed approach on a number of common wireless resource allocation problems.


Classification and Geometry of General Perceptual Manifolds

arXiv.org Machine Learning

Perceptual manifolds arise when a neural population responds to an ensemble of sensory signals associated with different physical features (e.g., orientation, pose, scale, location, and intensity) of the same perceptual object. Object recognition and discrimination requires classifying the manifolds in a manner that is insensitive to variability within a manifold. How neuronal systems give rise to invariant object classification and recognition is a fundamental problem in brain theory as well as in machine learning. Here we study the ability of a readout network to classify objects from their perceptual manifold representations. We develop a statistical mechanical theory for the linear classification of manifolds with arbitrary geometry revealing a remarkable relation to the mathematics of conic decomposition. Novel geometrical measures of manifold radius and manifold dimension are introduced which can explain the classification capacity for manifolds of various geometries. The general theory is demonstrated on a number of representative manifolds, including L2 ellipsoids prototypical of strictly convex manifolds, L1 balls representing polytopes consisting of finite sample points, and orientation manifolds which arise from neurons tuned to respond to a continuous angle variable, such as object orientation. The effects of label sparsity on the classification capacity of manifolds are elucidated, revealing a scaling relation between label sparsity and manifold radius. Theoretical predictions are corroborated by numerical simulations using recently developed algorithms to compute maximum margin solutions for manifold dichotomies. Our theory and its extensions provide a powerful and rich framework for applying statistical mechanics of linear classification to data arising from neuronal responses to object stimuli, as well as to artificial deep networks trained for object recognition tasks.


Assumed Density Filtering Q-learning

arXiv.org Artificial Intelligence

While off-policy temporal difference (TD) methods have widely been used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have not been utilized as frequently. One reason is that the non-linear max operation in the Bellman optimality equation makes it difficult to define conjugate distributions over the value functions. In this paper, we introduce a novel Bayesian approach to off-policy TD methods using Assumed Density Filtering (ADFQ), which updates beliefs on state-action values (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs provide a natural regularization for learning, and we show how ADFQ reduces in a limiting case to the traditional Q-learning algorithm. Our empirical results demonstrate that the proposed ADFQ algorithms outperform comparable algorithms on several task domains. Moreover, our algorithms are computationally more efficient than other existing approaches to Bayesian reinforcement learning.