Undirected Networks
Bayesian Robust Optimization for Imitation Learning
Brown, Daniel S., Niekum, Scott, Petrik, Marek
One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function. While completely ignoring risk can lead to overly aggressive and unsafe policies, optimizing in a fully adversarial sense is also problematic as it can lead to overly conservative policies that perform poorly in practice. To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL). BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk. Our empirical results show that BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms. Code is available at https://github.com/dsbrown1331/broil.
MAGIC: Learning Macro-Actions for Online POMDP Planning using Generator-Critic
Lee, Yiyuan, Cai, Panpan, Hsu, David
When robots operate in the real-world, they need to handle uncertainties in sensing, acting, and the environment. Many tasks also require reasoning about long-term consequences of robot decisions. The partially observable Markov decision process (POMDP) offers a principled approach for planning under uncertainty. However, its computational complexity grows exponentially with the planning horizon. We propose to use temporally-extended macro-actions to cut down the effective planning horizon and thus the exponential factor of the complexity. We propose Macro-Action Generator-Critic (MAGIC), an algorithm that learns a macro-action generator from data, and uses the learned macro-actions to perform long-horizon planning. MAGIC learns the generator using experience provided by an online planner, and in-turn conditions the planner using the generated macro-actions. We evaluate MAGIC on several long-term planning tasks, showing that it significantly outperforms planning using primitive actions, hand-crafted macro-actions, as well as naive reinforcement learning in both simulation and on a real robot.
Improving Sales Forecasting Accuracy: A Tensor Factorization Approach with Demand Awareness
Bi, Xuan, Adomavicius, Gediminas, Li, William, Qu, Annie
Due to accessible big data collections from consumers, products, and stores, advanced sales forecasting capabilities have drawn great attention from many companies especially in the retail business because of its importance in decision making. Improvement of the forecasting accuracy, even by a small percentage, may have a substantial impact on companies' production and financial planning, marketing strategies, inventory controls, supply chain management, and eventually stock prices. Specifically, our research goal is to forecast the sales of each product in each store in the near future. Motivated by tensor factorization methodologies for personalized context-aware recommender systems, we propose a novel approach called the Advanced Temporal Latent-factor Approach to Sales forecasting (ATLAS), which achieves accurate and individualized prediction for sales by building a single tensor-factorization model across multiple stores and products. Our contribution is a combination of: tensor framework (to leverage information across stores and products), a new regularization function (to incorporate demand dynamics), and extrapolation of tensor into future time periods using state-of-the-art statistical (seasonal auto-regressive integrated moving-average models) and machine-learning (recurrent neural networks) models. The advantages of ATLAS are demonstrated on eight product category datasets collected by the Information Resource, Inc., where a total of 165 million weekly sales transactions from more than 1,500 grocery stores over 15,560 products are analyzed.
Learning Graph Structure With A Finite-State Automaton Layer
Johnson, Daniel D., Larochelle, Hugo, Tarlow, Daniel
Graph-based neural network models are producing strong results in a number of domains, in part because graphs provide flexibility to encode domain knowledge in the form of relational structure (edges) between nodes in the graph. In practice, edges are used both to represent intrinsic structure (e.g., abstract syntax trees of programs) and more abstract relations that aid reasoning for a downstream task (e.g., results of relevant program analyses). In this work, we study the problem of learning to derive abstract relations from the intrinsic graph structure. Motivated by their power in program analyses, we consider relations defined by paths on the base graph accepted by a finite-state automaton. We show how to learn these relations end-to-end by relaxing the problem into learning finite-state automata policies on a graph-based POMDP and then training these policies using implicit differentiation. The result is a differentiable Graph Finite-State Automaton (GFSA) layer that adds a new edge type (expressed as a weighted adjacency matrix) to a base graph. We demonstrate that this layer can find shortcuts in grid-world graphs and reproduce simple static analyses on Python programs. Additionally, we combine the GFSA layer with a larger graph-based model trained end-to-end on the variable misuse program understanding task, and find that using the GFSA layer leads to better performance than using hand-engineered semantic edges or other baseline methods for adding learned edge types.
A Few Shot Adaptation of Visual Navigation Skills to New Observations using Meta-Learning
Luo, Qian, Sorokin, Maks, Ha, Sehoon
Target-driven visual navigation is a challenging problem that requires a robot to find the goal using only visual inputs. Many researchers have demonstrated promising results using deep reinforcement learning (deep RL) on various robotic platforms, but typical end-to-end learning is known for its poor extrapolation capability to new scenarios. Therefore, learning a navigation policy for a new robot with a new sensor configuration or a new target still remains a challenging problem. In this paper, we introduce a learning algorithm that enables rapid adaptation to new sensor configurations or target objects with a few shots. We design a policy architecture with latent features between perception and inference networks and quickly adapt the perception network via meta-learning while freezing the inference network. Our experiments show that our algorithm adapts the learned navigation policy with only three shots for unseen situations with different sensor configurations or different target colors. We also analyze the proposed algorithm by investigating various hyperparameters.
Contrastive Topographic Models: Energy-based density models applied to the understanding of sensory coding and cortical topography
We address the problem of building theoretical models that help elucidate the function of the visual brain at computational/algorithmic and structural/mechanistic levels. We seek to understand how the receptive fields and topographic maps found in visual cortical areas relate to underlying computational desiderata. We view the development of sensory systems from the popular perspective of probability density estimation; this is motivated by the notion that an effective internal representational scheme is likely to reflect the statistical structure of the environment in which an organism lives. We apply biologically based constraints on elements of the model. The thesis begins by surveying the relevant literature from the fields of neurobiology, theoretical neuroscience, and machine learning. After this review we present our main theoretical and algorithmic developments: we propose a class of probabilistic models, which we refer to as "energy-based models", and show equivalences between this framework and various other types of probabilistic model such as Markov random fields and factor graphs; we also develop and discuss approximate algorithms for performing maximum likelihood learning and inference in our energy based models. The rest of the thesis is then concerned with exploring specific instantiations of such models. By performing constrained optimisation of model parameters to maximise the likelihood of appropriate, naturalistic datasets we are able to qualitatively reproduce many of the receptive field and map properties found in vivo, whilst simultaneously learning about statistical regularities in the data.
Filtering for Aggregate Hidden Markov Models with Continuous Observations
Zhang, Qinsheng, Singh, Rahul, Chen, Yongxin
We consider a class of filtering problems for large populations where each individual is modeled by the same hidden Markov model (HMM). In this paper, we focus on aggregate inference problems in HMMs with discrete state space and continuous observation space. The continuous observations are aggregated in a way such that the individuals are indistinguishable from measurements. We propose an aggregate inference algorithm called continuous observation collective forward-backward algorithm. It extends the recently proposed collective forward-backward algorithm for aggregate inference in HMMs with discrete observations to the case of continuous observations. The efficacy of this algorithm is illustrated through several numerical experiments.
Belief-Grounded Networks for Accelerated Robot Learning under Partial Observability
Nguyen, Hai, Daley, Brett, Song, Xinchao, Amato, Christopher, Platt, Robert
Many important robotics problems are partially observable in the sense that a single visual or force-feedback measurement is insufficient to reconstruct the state. Standard approaches involve learning a policy over beliefs or observation-action histories. However, both of these have drawbacks; it is expensive to track the belief online, and it is hard to learn policies directly over histories. We propose a method for policy learning under partial observability called the Belief-Grounded Network (BGN) in which an auxiliary belief-reconstruction loss incentivizes a neural network to concisely summarize its input history. Since the resulting policy is a function of the history rather than the belief, it can be executed easily at runtime. We compare BGN against several baselines on classic benchmark tasks as well as three novel robotic touch-sensing tasks. BGN outperforms all other tested methods and its learned policies work well when transferred onto a physical robot.
Social Choice with Changing Preferences: Representation Theorems and Long-Run Policies
We study group decision making with changing preferences as a Markov Decision Process. We are motivated by the increasing prevalence of automated decision-making systems when making choices for groups of people over time. Our main contribution is to show how classic representation theorems from social choice theory can be adapted to characterize optimal policies in this dynamic setting. We provide an axiomatic characterization of MDP reward functions that agree with the Utilitarianism social welfare functionals of social choice theory. We also provide discussion of cases when the implementation of social choice-theoretic axioms may fail to lead to long-run optimal outcomes.
Generative Inverse Deep Reinforcement Learning for Online Recommendation
Chen, Xiaocong, Yao, Lina, Sun, Aixin, Wang, Xianzhi, Xu, Xiwei, Zhu, Liming
Deep reinforcement learning enables an agent to capture user's interest through interactions with the environment dynamically. It has attracted great interest in the recommendation research. Deep reinforcement learning uses a reward function to learn user's interest and to control the learning process. However, most reward functions are manually designed; they are either unrealistic or imprecise to reflect the high variety, dimensionality, and non-linearity properties of the recommendation problem. That makes it difficult for the agent to learn an optimal policy to generate the most satisfactory recommendations. To address the above issue, we propose a novel generative inverse reinforcement learning approach, namely InvRec, which extracts the reward function from user's behaviors automatically, for online recommendation. We conduct experiments on an online platform, VirtualTB, and compare with several state-of-the-art methods to demonstrate the feasibility and effectiveness of our proposed approach.