Goto

Collaborating Authors

 Reinforcement Learning


A deep learning approach to coordinate defensive escort teams

#artificialintelligence

Advancements in robotics and artificial intelligence (AI) are enabling the development of artificial agents designed to assist humans in a variety of everyday settings. One of the many possible uses for these systems could be to escort humans or valuable goods that are being transferred from one location to another, defending them from threats or attacks. Fascinated by this idea, a team of researchers at the University of New Mexico has recently introduced a new end-to-end solution for coordinating robotic escort teams that are protecting high-value payloads or goods. The technique they proposed, presented in a paper pre-published on arXiv, is based on deep reinforcement learning (RL), which entails training algorithms to make effective predictions by analyzing data. "I first came up with the idea behind this study when thinking about lugging my suitcase through a crowded airport," Lydia Tapia, the lead researcher on the study, told TechXplore.


RBED: Reward Based Epsilon Decay

arXiv.org Artificial Intelligence

$\varepsilon$-greedy is a policy used to balance exploration and exploitation in many reinforcement learning setting. In cases where the agent uses some on-policy algorithm to learn optimal behaviour, it makes sense for the agent to explore more initially and eventually exploit more as it approaches the target behaviour. This shift from heavy exploration to heavy exploitation can be represented as decay in the $\varepsilon$ value, where $\varepsilon$ depicts the how much an agent is allowed to explore. This paper proposes a new approach to this $\varepsilon$ decay where the decay is based on feedback from the environment. This paper also compares and contrasts one such approach based on rewards and compares it against standard exponential decay. The new approach, in the environments tested, produces more consistent results that on average perform better.


DADI: Dynamic Discovery of Fair Information with Adversarial Reinforcement Learning

arXiv.org Machine Learning

We introduce a framework for dynamic adversarial discovery of information (DADI), motivated by a scenario where information (a feature set) is used by third parties with unknown objectives. We train a reinforcement learning agent to sequentially acquire a subset of the information while balancing accuracy and fairness of predictors downstream. Based on the set of already acquired features, the agent decides dynamically to either collect more information from the set of available features or to stop and predict using the information that is currently available. Building on previous work exploring adversarial representation learning, we attain group fairness (demographic parity) by rewarding the agent with the adversary's loss, computed over the final feature set. Importantly, however, the framework provides a more general starting point for fair or private dynamic information discovery. Finally, we demonstrate empirically, using two real-world datasets, that we can trade-off fairness and predictive performance


Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control

arXiv.org Artificial Intelligence

Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations -- for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.


A Distributed Model-Free Algorithm for Multi-hop Ride-sharing using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

The growth of autonomous vehicles, ridesharing systems, and self driving technology will bring a shift in the way ride hailing platforms plan out their services. However, these advances in technology coupled with road congestion, environmental concerns, fuel usage, vehicles emissions, and the high cost of the vehicle usage have brought more attention to better utilize the use of vehicles and their capacities. In this paper, we propose a novel multi-hop ride-sharing (MHRS) algorithm that uses deep reinforcement learning to learn optimal vehicle dispatch and matching decisions by interacting with the external environment. By allowing customers to transfer between vehicles, i.e., ride with one vehicle for sometime and then transfer to another one, MHRS helps in attaining 30\% lower cost and 20\% more efficient utilization of fleets, as compared to the ride-sharing algorithms. This flexibility of multi-hop feature gives a seamless experience to customers and ride-sharing companies, and thus improves ride-sharing services.


Reinforcement Learning: A Brief Guide

#artificialintelligence

Reinforcement learning has the potential to solve tough decision-making problems in many applications, including industrial automation, autonomous driving, video game playing, and robotics. Reinforcement learning is a type of machine learning in which a computer learns to perform a task through repeated interactions with a dynamic environment. This trial-and-error learning approach enables the computer to make a series of decisions without human intervention and without being explicitly programmed to perform the task. One famous example of reinforcement learning in action is AlphaGo, the first computer program to defeat a world champion at the game of Go. Reinforcement learning works with data from a dynamic environment--in other words, with data that changes based on external conditions, such as weather or traffic flow.


Deep Reinforcement Learning for Walking Robots Video

#artificialintelligence

Sebastian Castro demonstrates an example of controlling humanoid robot locomotion using deep reinforcement learning, specifically the Deep Deterministic Policy Gradient (DDPG) algorithm. The robot is simulated using Simscape Multibody, while training the control policy is done using Reinforcement Learning Toolbox . In this video, Sebastian outlines the setup, training, and evaluation of reinforcement learning with Simulink models. First, he introduces how to choose states, actions, and a reward function for the reinforcement learning problem. Then he describes the neural network structure and training algorithm parameters.


Deep Reinforcement Learning for Distributed Uncoordinated Cognitive Radios Resource Allocation

arXiv.org Machine Learning

This paper presents a novel deep reinforcement learning-based resource allocation technique for the multi-agent environment presented by a cognitive radio network that coexists through underlay dynamic spectrum access (DSA) with a primary network. The resource allocation technique presented in this work is distributed, not requiring coordination with other agents. By ensuring convergence to equilibrium policies almost surely, the presented novel technique succeeds in addressing the challenge of a non-stationary multi-agent environment that results from the dynamic interaction between radios through the shared wireless environment. Simulation results show that in a finite learning time the presented technique is able to find policies that yield performance within 3 % of an exhaustive search solution, finding the optimal policy in nearly 70 % of cases, and that standard single-agent deep reinforcement learning may not achieve convergence when used in a non-coordinated, coupled multi-radio scenario.


Interactive Gibson: A Benchmark for Interactive Navigation in Cluttered Environments

arXiv.org Artificial Intelligence

-- We present Interactive Gibson, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task. For example, the robot can move objects if needed in order to clear a path leading to the goal location. Our benchmark comprises two novel elements: 1) a new experimental setup, the Interactive Gibson Environment, which simulates high fidelity visuals of indoor scenes, and high fidelity physical dynamics of the robot and common objects found in these scenes; 2) a set of Interactive Navigation metrics which allows one to study the interplay between navigation and physical interaction. We present and evaluate multiple learning-based baselines in Interactive Gibson, and provide insights into regimes of navigation with different tradeoffs between navigation path efficiency and disturbance of surrounding objects. Classical robot navigation is concerned with reaching goals while avoiding collisions [1], [2]. This definition of navigation is motivated by a wide variety of robot applications in factories or outdoor settings. As robots are increasingly deployed in complex and cluttered environments, physical interactions while navigating become not only unavoidable, but necessary. For example, when operating a robot in a cluttered home, the robot might need to push objects aside or open doors in order to be able to reach its destination. This problem is referred to as Interactive Navigation and in this paper we propose a principled and systematic way to study it (see Figure 1). The "aversion to interaction" in robot mobile agents is easy to understand: real robots are expensive, and interacting with the environment presents safety risks. In Robotic Manipulation these challenges have been addressed by extensive use of physics simulation engines [3], [4], [5], which simulate object and robot dynamics with high precision and thus allow one to study manipulation in a safe manner. Further, these engines can be used to train models which are deployable in the real world.


Automatic Testing and Falsification with Dynamically Constrained Reinforcement Learning

arXiv.org Artificial Intelligence

Automatic T esting and Falsification with Dynamically Constrained Reinforcement Learning Xin Qin 1, Nikos Ar echiga 2, Andrew Best 2, Jyotirmoy Deshmukh 1 Abstract -- We consider the problem of using reinforcement learning to train adversarial agents for automatic testing and falsification of cyberphysical systems, such as autonomous vehicles, robots, and airplanes. In order to produce useful agents, however, it is useful to be able to control the degree of adversariality by specifying rules that an agent must follow. For example, when testing an autonomous vehicle, it is useful to find maximally antagonistic traffic participants that obey traffic rules. We model dynamic constraints as hierarchically ordered rules expressed in Signal T emporal Logic, and show how these can be incorporated into an agent training process. We prove that our agent-centric approach is able to find all dangerous behaviors that can be found by traditional falsification techniques while producing modular and reusable agents. We demonstrate our approach on two case studies from the automotive domain. I NTRODUCTION When developing cyberphysical systems such as autonomous vehicles, drones, or aircraft, it is important to have a robust testing strategy that finds critical bugs before the system is put into production. Falsification techniques exist to find simulations in which the system under test fails to satisfy its target specification. These falsification traces can be generated from a bounded set of inputs.