Reinforcement Learning
Microsoft's AI Platform Gets A Big Boost With Bonsai Acquisition
Microsoft has announced that it is acquiring Bonsai โ an AI startup focused on reinforcement learning โ to expand its AI offerings. This move helps Microsoft in expanding its portfolio to autonomous systems and industrial control systems. Bonsai, an artificial intelligence startup based in Berkeley, California, aims to democratize AI by making the technology accessible to business decision makers. It is abstracting the complexity involved in implementing reinforcement learning. Mark Hammond, the co-founder, and CEO of Bonsai is not new to Microsoft.
Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning
Pan, Xinlei, Ohn-Bar, Eshed, Rhinehart, Nicholas, Xu, Yan, Shen, Yilin, Kitani, Kris M.
Humans are able to understand and perform complex tasks by strategically structuring the tasks into incremental steps or subgoals. For a robot attempting to learn to perform a sequential task with critical subgoal states, such states can provide a natural opportunity for interaction with a human expert. This paper analyzes the benefit of incorporating a notion of subgoals into Inverse Reinforcement Learning (IRL) with a Human-In-The-Loop (HITL) framework. The learning process is interactive, with a human expert first providing input in the form of full demonstrations along with some subgoal states. These subgoal states define a set of subtasks for the learning agent to complete in order to achieve the final goal. The learning agent queries for partial demonstrations corresponding to each subtask as needed when the agent struggles with the subtask. The proposed Human Interactive IRL (HI-IRL) framework is evaluated on several discrete path-planning tasks. We demonstrate that subgoal-based interactive structuring of the learning task results in significantly more efficient learning, requiring only a fraction of the demonstration data needed for learning the underlying reward function with the baseline IRL model.
Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop
Biehl, Martin, Guckelsberger, Christian, Salge, Christoph, Smith, Simรณn C., Polani, Daniel
Active inference is an ambitious theory that treats perception, inference and action selection of autonomous agents under the heading of a single principle. It suggests biologically plausible explanations for many cognitive phenomena, including consciousness. In active inference, action selection is driven by an objective function that evaluates possible future actions with respect to current, inferred beliefs about the world. Active inference at its core is independent from extrinsic rewards, resulting in a high level of robustness across e.g.\ different environments or agent morphologies. In the literature, paradigms that share this independence have been summarised under the notion of intrinsic motivations. In general and in contrast to active inference, these models of motivation come without a commitment to particular inference and action selection mechanisms. In this article, we study if the inference and action selection machinery of active inference can also be used by alternatives to the originally included intrinsic motivation. The perception-action loop explicitly relates inference and action selection to the environment and agent memory, and is consequently used as foundation for our analysis. We reconstruct the active inference approach, locate the original formulation within, and show how alternative intrinsic motivations can be used while keeping many of the original features intact. Furthermore, we illustrate the connection to universal reinforcement learning by means of our formalism. Active inference research may profit from comparisons of the dynamics induced by alternative intrinsic motivations. Research on intrinsic motivations may profit from an additional way to implement intrinsically motivated agents that also share the biological plausibility of active inference.
Hindsight policy gradients
Rauber, Paulo, Ummadisingu, Avinash, Mutz, Filipe, Schmidhuber, Juergen
A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enable sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this paper, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.
A New Approach for Resource Scheduling with Deep Reinforcement Learning
Ye, Yufei, Ren, Xiaoqin, Wang, Jin, Xu, Lingxiao, Guo, Wenxia, Huang, Wenqiang, Tian, Wenhong
With the rapid development of deep learning, deep reinforcement learning (DRL) began to appear in the field of resource scheduling in recent years. Based on the previous research on DRL in the literature, we introduce online resource scheduling algorithm DeepRM2 and the offline resource scheduling algorithm DeepRM_Off. Compared with the state-of-the-art DRL algorithm DeepRM and heuristic algorithms, our proposed algorithms have faster convergence speed and better scheduling efficiency with regarding to average slowdown time, job completion time and rewards.
Bonsai joins Microsoft to cultivate our common vision: BRAINs for Autonomous Systems
Keen and I founded Bonsai in 2014 with the vision of putting AI in the hands of every developer. Over the past four years our team has worked tirelessly to make this vision a reality by combining the power of machine teaching and deep reinforcement learning into an end-to-end platform that is accessible not only to data scientists but software engineers and subject matter experts. The strongest initial commercial traction for this platform has been in the industrial verticals where customers are improving the operations of dynamic control systems across applications including robotics, HVAC, engines, wind turbines and machine tuning. The 30x performance improvement Siemens recently realized auto-calibrating CNC machines powered by a Bonsai BRAIN is just scratching the surface of the significant business impact deep reinforcement learning can bring to these real world systems. Going forward, we see a massive opportunity to empower enterprises & developers globally with the tools and technology needed to build and operate the BRAINs that power these intelligent autonomous systems.
An Approximate Bayesian Reinforcement Learning Approach Using Robust Control Policy and Tree Search
Hishinuma, Toru (Kyoto University) | Senda, Kei (Kyoto University)
For autonomous robots, we propose an approximate model-based Bayesian reinforcement learning (MB-BRL) approach that reduces real-world samples within feasible computational efforts. Firstly, to find an approximate solution of an original undiscounted infinite horizon MB-BRL problem with a cost-free termination, we consider a finite horizon (FH) MB-BRL problem in which terminal costs are given by robust control policies. The resulting performance is better than or equal to the performance obtained with a robust method, while the resulting policy may choose an explorative behavior to get useful information about parametric model uncertainty for reducing real-world samples. Secondly, to obtain a feasible solution of the FH MB-BRL problem using simulation samples, we propose a combination of robust RL, Monte Carlo tree search (MCTS), and Bayesian inference. We show an idea of reusing previous MCTS samples for Bayesian inference at a leaf node. The proposed approach allows an agent to choose from multiple robust policies at a leaf node. Numerical experiments of a two-dimensional peg-in-hole task demonstrate the effectiveness of the proposed approach.
Skilled Experience Catalogue: A Skill-Balancing Mechanism for Non-Player Characters using Reinforcement Learning
Glavin, Frank G., Madden, Michael G.
In this paper, we introduce a skill-balancing mechanism for adversarial non-player characters (NPCs), called Skilled Experience Catalogue (SEC). The objective of this mechanism is to approximately match the skill level of an NPC to an opponent in real-time. We test the technique in the context of a First-Person Shooter (FPS) game. Specifically, the technique adjusts a reinforcement learning NPC's proficiency with a weapon based on its current performance against an opponent. Firstly, a catalogue of experience, in the form of stored learning policies, is built up by playing a series of training games. Once the NPC has been sufficiently trained, the catalogue acts as a timeline of experience with incremental knowledge milestones in the form of stored learning policies. If the NPC is performing poorly, it can jump to a later stage in the learning timeline to be equipped with more informed decision-making. Likewise, if it is performing significantly better than the opponent, it will jump to an earlier stage. The NPC continues to learn in real-time using reinforcement learning but its policy is adjusted, as required, by loading the most suitable milestones for the current circumstances.
Reinforcement Learning using Augmented Neural Networks
Neural networks allow Q-learning reinforcement learning agents such as deep Q-networks (DQN) to approximate complex mappings from state spaces to value functions. However, this also brings drawbacks when compared to other function approximators such as tile coding or their generalisations, radial basis functions (RBF) because they introduce instability due to the side effect of globalised updates present in neural networks. This instability does not even vanish in neural networks that do not have any hidden layers. In this paper, we show that simple modifications to the structure of the neural network can improve stability of DQN learning when a multi-layer perceptron is used for function approximation.
Learning Neural Parsers with Deterministic Differentiable Imitation Learning
Shankar, Tanmay, Rhinehart, Nicholas, Muelling, Katharina, Kitani, Kris M.
We address the problem of spatial segmentation of a 2D object in the context of a robotic system for painting, where an optimal segmentation depends on both the appearance of the object and the size of each segment. Since each segment must take into account appearance features at several scales, we take a hierarchical grammar-based parsing approach to decompose the object into 2D segments for painting. Since there are many ways to segment an object the solution space is extremely large and it is very challenging to utilize an exploration based optimization approach like reinforcement learning. Instead, we pose the segmentation problem as an imitation learning problem by using a segmentation algorithm in the place of an expert, that has access to a small dataset with known foreground-background segmentations. During the imitation learning process, we learn to imitate the oracle (segmentation algorithm) using only the image of the object, without the use of the known foreground-background segmentations. We introduce a novel deterministic policy gradient update, DRAG, in the form of a deterministic actor-critic variant of AggreVaTeD, to train our neural network based object parser. We will also show that our approach can be seen as extending DDPG to the Imitation Learning scenario. Training our neural parser to imitate the oracle via DRAG allow our neural parser to outperform several existing imitation learning approaches.