AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

A deep learning approach to coordinate defensive escort teams

#artificialintelligenceOct-30-2019, 02:17:14 GMT

Advancements in robotics and artificial intelligence (AI) are enabling the development of artificial agents designed to assist humans in a variety of everyday settings. One of the many possible uses for these systems could be to escort humans or valuable goods that are being transferred from one location to another, defending them from threats or attacks. Fascinated by this idea, a team of researchers at the University of New Mexico has recently introduced a new end-to-end solution for coordinating robotic escort teams that are protecting high-value payloads or goods. The technique they proposed, presented in a paper pre-published on arXiv, is based on deep reinforcement learning (RL), which entails training algorithms to make effective predictions by analyzing data. "I first came up with the idea behind this study when thinking about lugging my suitcase through a crowded airport," Lydia Tapia, the lead researcher on the study, told TechXplore.

agent, defensive escort team, escort team, (13 more...)

#artificialintelligence

Country: North America > United States > New Mexico (0.27)

Genre: Research Report (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)

Add feedback

RBED: Reward Based Epsilon Decay

Maroti, Aakash

arXiv.org Artificial IntelligenceOct-30-2019

$\varepsilon$-greedy is a policy used to balance exploration and exploitation in many reinforcement learning setting. In cases where the agent uses some on-policy algorithm to learn optimal behaviour, it makes sense for the agent to explore more initially and eventually exploit more as it approaches the target behaviour. This shift from heavy exploration to heavy exploitation can be represented as decay in the $\varepsilon$ value, where $\varepsilon$ depicts the how much an agent is allowed to explore. This paper proposes a new approach to this $\varepsilon$ decay where the decay is based on feedback from the environment. This paper also compares and contrasts one such approach based on rewards and compares it against standard exponential decay. The new approach, in the environments tested, produces more consistent results that on average perform better.

agent, artificial intelligence, upstream oil & gas, (20 more...)

arXiv.org Artificial Intelligence

1910.13701

Genre: Research Report (0.42)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.36)

Add feedback

DADI: Dynamic Discovery of Fair Information with Adversarial Reinforcement Learning

Bakker, Michiel A., Tu, Duy Patrick, Valdés, Humberto Riverón, Gummadi, Krishna P., Varshney, Kush R., Weller, Adrian, Pentland, Alex

arXiv.org Machine LearningOct-30-2019

We introduce a framework for dynamic adversarial discovery of information (DADI), motivated by a scenario where information (a feature set) is used by third parties with unknown objectives. We train a reinforcement learning agent to sequentially acquire a subset of the information while balancing accuracy and fairness of predictors downstream. Based on the set of already acquired features, the agent decides dynamically to either collect more information from the set of available features or to stop and predict using the information that is currently available. Building on previous work exploring adversarial representation learning, we attain group fairness (demographic parity) by rewarding the agent with the adversary's loss, computed over the final feature set. Importantly, however, the framework provides a more general starting point for fair or private dynamic information discovery. Finally, we demonstrate empirically, using two real-world datasets, that we can trade-off fairness and predictive performance

information, learning, representation, (15 more...)

arXiv.org Machine Learning

1910.13983

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Mexico (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control

Devin, Coline, Geng, Daniel, Abbeel, Pieter, Darrell, Trevor, Levine, Sergey

arXiv.org Artificial IntelligenceOct-30-2019

Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations -- for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.

reference trajectory, trajectory, vector, (16 more...)

arXiv.org Artificial Intelligence

1910.14033

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Sweden > Skåne County > Malmö (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

A Distributed Model-Free Algorithm for Multi-hop Ride-sharing using Deep Reinforcement Learning

Singh, Ashutosh, Alabbasi, Abubakr, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceOct-30-2019

The growth of autonomous vehicles, ridesharing systems, and self driving technology will bring a shift in the way ride hailing platforms plan out their services. However, these advances in technology coupled with road congestion, environmental concerns, fuel usage, vehicles emissions, and the high cost of the vehicle usage have brought more attention to better utilize the use of vehicles and their capacities. In this paper, we propose a novel multi-hop ride-sharing (MHRS) algorithm that uses deep reinforcement learning to learn optimal vehicle dispatch and matching decisions by interacting with the external environment. By allowing customers to transfer between vehicles, i.e., ride with one vehicle for sometime and then transfer to another one, MHRS helps in attaining 30\% lower cost and 20\% more efficient utilization of fleets, as compared to the ride-sharing algorithms. This flexibility of multi-hop feature gives a seamless experience to customers and ride-sharing companies, and thus improves ride-sharing services.

customer, passenger, vehicle, (17 more...)

arXiv.org Artificial Intelligence

1910.14002

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
North America > Canada > British Columbia > Vancouver Island > Capital Regional District > Victoria (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning: A Brief Guide

#artificialintelligenceOct-29-2019, 17:00:30 GMT

Reinforcement learning has the potential to solve tough decision-making problems in many applications, including industrial automation, autonomous driving, video game playing, and robotics. Reinforcement learning is a type of machine learning in which a computer learns to perform a task through repeated interactions with a dynamic environment. This trial-and-error learning approach enables the computer to make a series of decisions without human intervention and without being explicitly programmed to perform the task. One famous example of reinforcement learning in action is AlphaGo, the first computer program to defeat a world champion at the game of Go. Reinforcement learning works with data from a dynamic environment--in other words, with data that changes based on external conditions, such as weather or traffic flow.

brief guide, dynamic environment, reinforcement learning, (4 more...)

#artificialintelligence

Industry:

Information Technology (0.78)
Leisure & Entertainment > Games > Go (0.61)
Transportation > Ground > Road (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Reinforcement Learning for Walking Robots Video

#artificialintelligenceOct-29-2019, 11:45:15 GMT

Sebastian Castro demonstrates an example of controlling humanoid robot locomotion using deep reinforcement learning, specifically the Deep Deterministic Policy Gradient (DDPG) algorithm. The robot is simulated using Simscape Multibody, while training the control policy is done using Reinforcement Learning Toolbox . In this video, Sebastian outlines the setup, training, and evaluation of reinforcement learning with Simulink models. First, he introduces how to choose states, actions, and a reward function for the reinforcement learning problem. Then he describes the neural network structure and training algorithm parameters.

deep reinforcement learning, robot video

#artificialintelligence

Industry: Education > Focused Education > Special Education (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Reinforcement Learning for Distributed Uncoordinated Cognitive Radios Resource Allocation

Tondwalkar, Ankita, Kwasinski, Andres

arXiv.org Machine LearningOct-29-2019

This paper presents a novel deep reinforcement learning-based resource allocation technique for the multi-agent environment presented by a cognitive radio network that coexists through underlay dynamic spectrum access (DSA) with a primary network. The resource allocation technique presented in this work is distributed, not requiring coordination with other agents. By ensuring convergence to equilibrium policies almost surely, the presented novel technique succeeds in addressing the challenge of a non-stationary multi-agent environment that results from the dynamic interaction between radios through the shared wireless environment. Simulation results show that in a finite learning time the presented technique is able to find policies that yield performance within 3 % of an exhaustive search solution, finding the optimal policy in nearly 70 % of cases, and that standard single-agent deep reinforcement learning may not achieve convergence when used in a non-coordinated, coupled multi-radio scenario.

algorithm, convergence, exploration phase, (15 more...)

arXiv.org Machine Learning

1911.03366

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Interactive Gibson: A Benchmark for Interactive Navigation in Cluttered Environments

Xia, Fei, Shen, William B., Li, Chengshu, Kasimbeg, Priya, Tchapmi, Micael, Toshev, Alexander, Martín-Martín, Roberto, Savarese, Silvio

arXiv.org Artificial IntelligenceOct-29-2019

-- We present Interactive Gibson, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task. For example, the robot can move objects if needed in order to clear a path leading to the goal location. Our benchmark comprises two novel elements: 1) a new experimental setup, the Interactive Gibson Environment, which simulates high fidelity visuals of indoor scenes, and high fidelity physical dynamics of the robot and common objects found in these scenes; 2) a set of Interactive Navigation metrics which allows one to study the interplay between navigation and physical interaction. We present and evaluate multiple learning-based baselines in Interactive Gibson, and provide insights into regimes of navigation with different tradeoffs between navigation path efficiency and disturbance of surrounding objects. Classical robot navigation is concerned with reaching goals while avoiding collisions [1], [2]. This definition of navigation is motivated by a wide variety of robot applications in factories or outdoor settings. As robots are increasingly deployed in complex and cluttered environments, physical interactions while navigating become not only unavoidable, but necessary. For example, when operating a robot in a cluttered home, the robot might need to push objects aside or open doors in order to be able to reach its destination. This problem is referred to as Interactive Navigation and in this paper we propose a principled and systematic way to study it (see Figure 1). The "aversion to interaction" in robot mobile agents is easy to understand: real robots are expensive, and interacting with the environment presents safety risks. In Robotic Manipulation these challenges have been addressed by extensive use of physics simulation engines [3], [4], [5], which simulate object and robot dynamics with high precision and thus allow one to study manipulation in a safe manner. Further, these engines can be used to train models which are deployable in the real world.

agent, interaction, interactive navigation, (9 more...)

arXiv.org Artificial Intelligence

1910.14442

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Automatic Testing and Falsification with Dynamically Constrained Reinforcement Learning

Qin, Xin, Aréchiga, Nikos, Best, Andrew, Deshmukh, Jyotirmoy

arXiv.org Artificial IntelligenceOct-29-2019

Automatic T esting and Falsification with Dynamically Constrained Reinforcement Learning Xin Qin 1, Nikos Ar echiga 2, Andrew Best 2, Jyotirmoy Deshmukh 1 Abstract -- We consider the problem of using reinforcement learning to train adversarial agents for automatic testing and falsification of cyberphysical systems, such as autonomous vehicles, robots, and airplanes. In order to produce useful agents, however, it is useful to be able to control the degree of adversariality by specifying rules that an agent must follow. For example, when testing an autonomous vehicle, it is useful to find maximally antagonistic traffic participants that obey traffic rules. We model dynamic constraints as hierarchically ordered rules expressed in Signal T emporal Logic, and show how these can be incorporated into an agent training process. We prove that our agent-centric approach is able to find all dangerous behaviors that can be found by traditional falsification techniques while producing modular and reusable agents. We demonstrate our approach on two case studies from the automotive domain. I NTRODUCTION When developing cyberphysical systems such as autonomous vehicles, drones, or aircraft, it is important to have a robust testing strategy that finds critical bugs before the system is put into production. Falsification techniques exist to find simulations in which the system under test fails to satisfy its target specification. These falsification traces can be generated from a bounded set of inputs.

adversarial agent, agent, vehicle, (13 more...)

arXiv.org Artificial Intelligence

1910.13645

Country:

North America > United States > California (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report (0.40)

Industry: Transportation (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback