Goto

Collaborating Authors

 Reinforcement Learning


Safety-guaranteed Reinforcement Learning based on Multi-class Support Vector Machine

arXiv.org Artificial Intelligence

Several works have addressed the problem of incorporating constraints in the reinforcement learning (RL) framework, however majority of them can only guarantee the satisfaction of soft constraints. In this work, we address the problem of satisfying hard state constraints in a model-free RL setting with the deterministic system dynamics. The proposed algorithm is developed for the discrete state and action space and utilizes a multi-class support vector machine (SVM) to represent the policy. The state constraints are incorporated in the SVM optimization framework to derive an analytical solution for determining the policy parameters. This final policy converges to a solution which is guaranteed to satisfy the constraints. Additionally, the proposed formulation adheres to the Q-learning framework and thus, also guarantees convergence to the optimal solution. The algorithm is demonstrated with multiple example problems.


Decorrelated Double Q-learning

arXiv.org Artificial Intelligence

Q-learning with value function approximation may have the poor performance because of overestimation bias and imprecise estimate. Specifically, overestimation bias is from the maximum operator over noise estimate, which is exaggerated using the estimate of a subsequent state. Inspired by the recent advance of deep reinforcement learning and Double Q-learning, we introduce the decorrelated double Q-learning (D2Q). Specifically, we introduce the decorrelated regularization item to reduce the correlation between value function approximators, which can lead to less biased estimation and low variance. The experimental results on a suite of MuJoCo continuous control tasks demonstrate that our decorrelated double Q-learning can effectively improve the performance.


StarCraft II Build Order Optimization using Deep Reinforcement Learning and Monte-Carlo Tree Search

arXiv.org Artificial Intelligence

The real-time strategy game of StarCraft II has been posed as a challenge for reinforcement learning by Google's DeepMind. This study examines the use of an agent based on the Monte-Carlo Tree Search algorithm for optimizing the build order in StarCraft II, and discusses how its performance can be improved even further by combining it with a deep reinforcement learning neural network. The experimental results accomplished using Monte-Carlo Tree Search achieves a score similar to a novice human player by only using very limited time and computational resources, which paves the way to achieving scores comparable to those of a human expert by combining it with the use of deep reinforcement learning.


DECSTR: Learning Goal-Directed Abstract Behaviors using Pre-Verbal Spatial Predicates in Intrinsically Motivated Agents

arXiv.org Artificial Intelligence

Intrinsically motivated agents freely explore their environment and set their own goals. Such goals are traditionally represented as specific states, but recent works introduced the use of language to facilitate abstraction. Language can, for example, represent goals as sets of general properties that surrounding objects should verify. However, language-conditioned agents are trained simultaneously to understand language and to act, which seems to contrast with how children learn: infants demonstrate goal-oriented behaviors and abstract spatial concepts very early in their development, before language mastery. Guided by these findings from developmental psychology, we introduce a high-level state representation based on natural semantic predicates that describe spatial relations between objects and that are known to be present early in infants. In a robotic manipulation environment, our DECSTR system explores this representation space by manipulating objects, and efficiently learns to achieve any reachable configuration within it. It does so by leveraging an object-centered modular architecture, a symmetry inductive bias, and a new form of automatic curriculum learning for goal selection and policy learning. As with children, language acquisition takes place in a second phase, independently from goal-oriented sensorimotor learning. This is done via a new goal generation module, conditioned on instructions describing expected transformations in object relations. We present ablations studies for each component and highlight several advantages of targeting abstract goals over specific ones. We further show that using this intermediate representation enables efficient language grounding by evaluating agents on sequences of language instructions and their logical combinations.


Systematic Generalisation through Task Temporal Logic and Deep Reinforcement Learning

arXiv.org Artificial Intelligence

This paper presents a neuro-symbolic agent that combines deep reinforcement learning (DRL) with temporal logic (TL), and achieves systematic out-of-distribution generalisation in tasks that involve following a formally specified instruction. Specifically, the agent learns general notions of negation and disjunction, and successfully applies them to previously unseen objects without further training. To this end, we also introduce Task Temporal Logic (TTL), a learning-oriented formal language, whose atoms are designed to help the training of a DRL agent targeting systematic generalisation. To validate this combination of logic-based and neural-network techniques, we provide experimental evidence for the kind of neural-network architecture that most enhances the generalisation performance of the agent. Our findings suggest that the right architecture can significatively improve the ability of the agent to generalise in systematic ways, even with abstract operators, such as negation, which previous research have struggled with.


Language-Conditioned Goal Generation: a New Approach to Language Grounding for RL

arXiv.org Artificial Intelligence

In the real world, linguistic agents are also embodied agents: they perceive and act in the physical world. The notion of Language Grounding questions the interactions between language and embodiment: how do learning agents connect or ground linguistic representations to the physical world ? This question has recently been approached by the Reinforcement Learning community under the framework of instruction-following agents. In these agents, behavioral policies or reward functions are conditioned on the embedding of an instruction expressed in natural language. This paper proposes another approach: using language to condition goal generators. Given any goal-conditioned policy, one could train a language-conditioned goal generator to generate language-agnostic goals for the agent. This method allows to decouple sensorimotor learning from language acquisition and enable agents to demonstrate a diversity of behaviors for any given instruction. We propose a particular instantiation of this approach and demonstrate its benefits.


Deep Reinforcement Learning for Neural Control

arXiv.org Artificial Intelligence

We present a novel methodology for control of neural circuits based on deep reinforcement learning. Our approach achieves aimed behavior by generating external continuous stimulation of existing neural circuits (neuromodulation control) or modulations of neural circuits architecture (connectome control). Both forms of control are challenging due to nonlinear and recurrent complexity of neural activity. To infer candidate control policies, our approach maps neural circuits and their connectome into a grid-world like setting and infers the actions needed to achieve aimed behavior. The actions are inferred by adaptation of deep Q-learning methods known for their robust performance in navigating grid-worlds. We apply our approach to the model of \textit{C. elegans} which simulates the full somatic nervous system with muscles and body. Our framework successfully infers neuropeptidic currents and synaptic architectures for control of chemotaxis. Our findings are consistent with in vivo measurements and provide additional insights into neural control of chemotaxis. We further demonstrate the generality and scalability of our methods by inferring chemotactic neural circuits from scratch.


Learning Causal Models Online

arXiv.org Artificial Intelligence

Predictive models -- learned from observational data not covering the complete data distribution -- can rely on spurious correlations in the data for making predictions. These correlations make the models brittle and hinder generalization. One solution for achieving strong generalization is to incorporate causal structures in the models; such structures constrain learning by ignoring correlations that contradict them. However, learning these structures is a hard problem in itself. Moreover, it's not clear how to incorporate the machinery of causality with online continual learning. In this work, we take an indirect approach to discovering causal models. Instead of searching for the true causal model directly, we propose an online algorithm that continually detects and removes spurious features. Our algorithm works on the idea that the correlation of a spurious feature with a target is not constant over-time. As a result, the weight associated with that feature is constantly changing. We show that by continually removing such features, our method converges to solutions that have strong generalization. Moreover, our method combined with random search can also discover non-spurious features from raw sensory data. Finally, our work highlights that the information present in the temporal structure of the problem -- destroyed by shuffling the data -- is essential for detecting spurious features online.


Human and Multi-Agent collaboration in a human-MARL teaming framework

arXiv.org Artificial Intelligence

Collaborative multi-agent reinforcement learning (MARL) as a specific category of reinforcement learning provides effective results with agents learning from their observations, received rewards, and internal interactions between agents. However, centralized learning methods with a joint global policy in a highly dynamic environment present unique challenges in dealing with large amounts of information. This study proposes two innovative solutions to address the complexities of a collaboration between a human and multiple reinforcement learning (RL)-based agents (referred to thereafter as Human-MARL teaming) where the goals pursued cannot be achieved by a human alone or agents alone. The first innovation is the introduction of a new open-source MARL framework, called COGMENT, to unite humans and agents in real-time complex dynamic systems and efficiently leverage their interactions as a source of learning. The second innovation is our proposal of a new hybrid MARL method, named Dueling Double Deep Q learning MADDPG (D3-MADDPG) to allow agents to train decentralized policies parallelly in a joint centralized policy. This method can solve the overestimation problem in Q-learning methods of value-based MARL. We demonstrate these innovations by using a designed real-time environment with unmanned aerial vehicles driven by RL agents, collaborating with a human to fight fires. The team of RL agent drones autonomously look for fire seats and the human pilot douses the fires. The results of this study show that the proposed collaborative paradigm and the open-source framework leads to significant reductions in both human effort and exploration costs. Also, the results of the proposed hybrid MARL method shows that it effectively improves the learning process to achieve more reliable Q-values for each action, by decoupling the estimation between state value and advantage value.


Similarity-based transfer learning of decision policies

arXiv.org Artificial Intelligence

A problem of learning decision policy from past experience is considered. Using the Fully Probabilistic Design (FPD) formalism, we propose a new general approach for finding a stochastic policy from the past data.