AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Emergence of Different Modes of Tool Use in a Reaching and Dragging Task

Nguyen, Khuong, Choe, Yoonsuck

arXiv.org Artificial IntelligenceDec-8-2020

Tool use is an important milestone in the evolution of intelligence. In this paper, we investigate different modes of tool use that emerge in a reaching and dragging task. In this task, a jointed arm with a gripper must grab a tool (T, I, or L-shaped) and drag an object down to the target location (the bottom of the arena). The simulated environment had real physics such as gravity and friction. We trained a deep-reinforcement learning based controller (with raw visual and proprioceptive input) with minimal reward shaping information to tackle this task. We observed the emergence of a wide range of unexpected behaviors, not directly encoded in the motor primitives or reward functions. Examples include hitting the object to the target location, correcting error of initial contact, throwing the tool toward the object, as well as normal expected behavior such as wide sweep. Also, we further analyzed these behaviors based on the type of tool and the initial position of the target object. Our results show a rich repertoire of behaviors, beyond the basic built-in mechanisms of the deep reinforcement learning method we used.

agent, i-shaped tool, l-shaped tool, (15 more...)

arXiv.org Artificial Intelligence

2012.047

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Resolving Implicit Coordination in Multi-Agent Deep Reinforcement Learning with Deep Q-Networks & Game Theory

Adams, Griffin, Padmanabhan, Sarguna Janani, Shekhar, Shivang

arXiv.org Artificial IntelligenceDec-8-2020

We address two major challenges of implicit coordination in multi-agent deep reinforcement learning: non-stationarity and exponential growth of state-action space, by combining Deep-Q Networks for policy learning with Nash equilibrium for action selection. Q-values proxy as payoffs in Nash settings, and mutual best responses define joint action selection. Coordination is implicit because multiple/no Nash equilibria are resolved deterministically. We demonstrate that knowledge of game type leads to an assumption of mirrored best responses and faster convergence than Nash-Q. Specifically, the Friend-or-Foe algorithm demonstrates signs of convergence to a Set Controller which jointly chooses actions for two agents. This encouraging given the highly unstable nature of decentralized coordination over joint actions. Inspired by the dueling network architecture, which decouples the Q-function into state and advantage streams, as well as residual networks, we learn both a single and joint agent representation, and merge them via element-wise addition. This simplifies coordination by recasting it is as learning a residual function. We also draw high level comparative insights on key MADRL and game theoretic variables: competitive vs. cooperative, asynchronous vs. parallel learning, greedy versus socially optimal Nash equilibria tie breaking, and strategies for the no Nash equilibrium case. We evaluate on 3 custom environments written in Python using OpenAI Gym: a Predator Prey environment, an alternating Warehouse environment, and a Synchronization environment. Each environment requires successively more coordination to achieve positive rewards.

agent, architecture, coordination, (15 more...)

arXiv.org Artificial Intelligence

2012.09136

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Fast reinforcement learning through the composition of behaviours

#artificialintelligenceDec-7-2020, 05:00:30 GMT

Recently, the combination of RL with deep learning has led to impressive results, such as agents that can learn how to play boardgames like Go and chess, the full spectrum of Atari games, as well as more modern, difficult video games like Dota and StarCraft II.

agent, fast reinforcement, reinforcement, (5 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.58)

Add feedback

Reset-Free Lifelong Learning with Skill-Space Planning

Lu, Kevin, Grover, Aditya, Abbeel, Pieter, Mordatch, Igor

arXiv.org Artificial IntelligenceDec-7-2020

The objective of lifelong reinforcement learning (RL) is to optimize agents which can continuously adapt and interact in changing environments. However, current RL approaches fail drastically when environments are non-stationary and interactions are non-episodic. We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills. We learn the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model. Moreover, our framework permits skill discovery even from offline data, thereby reducing the need for excessive real-world interactions. We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments derived from gridworld and MuJoCo benchmarks.

agent, lisp, sergey levine, (14 more...)

arXiv.org Artificial Intelligence

2012.03548

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Education > Educational Setting > Continuing Education (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

GAEA: Graph Augmentation for Equitable Access via Reinforcement Learning

Ramachandran, Govardana Sachithanandam, Brugere, Ivan, Varshney, Lav R., Xiong, Caiming

arXiv.org Artificial IntelligenceDec-7-2020

Disparate access to resources by different subpopulations is a prevalent issue in societal and sociotechnical networks. For example, urban infrastructure networks may enable certain racial groups to more easily access resources such as high-quality schools, grocery stores, and polling places. Similarly, social networks within universities and organizations may enable certain groups to more easily access people with valuable information or influence. Here we introduce a new class of problems, Graph Augmentation for Equitable Access (GAEA), to enhance equity in networked systems by editing graph edges under budget constraints. We prove such problems are NP-hard, and cannot be approximated within a factor of $(1-\tfrac{1}{3e})$. We develop a principled, sample- and time- efficient Markov Reward Process (MRP)-based mechanism design framework for GAEA. Our algorithm outperforms baselines on a diverse set of synthetic graphs. We further demonstrate the method on real-world networks, by merging public census, school, and transportation datasets for the city of Chicago and applying our algorithm to find human-interpretable edits to the bus network that enhance equitable access to high-quality schools across racial groups. Further experiments on Facebook networks of universities yield sets of new social connections that would increase equitable access to certain attributed nodes across gender groups.

graph, node, particle, (15 more...)

arXiv.org Artificial Intelligence

2012.039

Country:

North America > United States > Illinois > Cook County > Chicago (0.27)
North America > United States > Michigan (0.04)

Genre: Research Report (0.64)

Industry:

Government (1.00)
Education (1.00)
Information Technology (0.90)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Deep Policy Networks for NPC Behaviors that Adapt to Changing Design Parameters in Roguelike Games

Sestini, Alessandro, Kuhnle, Alexander, Bagdanov, Andrew D.

arXiv.org Artificial IntelligenceDec-7-2020

Recent advances in Deep Reinforcement Learning (DRL) have largely focused on improving the performance of agents with the aim of replacing humans in known and well-defined environments. The use of these techniques as a game design tool for video game production, where the aim is instead to create Non-Player Character (NPC) behaviors, has received relatively little attention until recently. Turn-based strategy games like Roguelikes, for example, present unique challenges to DRL. In particular, the categorical nature of their complex game state, composed of many entities with different attributes, requires agents able to learn how to compare and prioritize these entities. Moreover, this complexity often leads to agents that overfit to states seen during training and that are unable to generalize in the face of design changes made during development. In this paper we propose two network architectures which, when combined with a \emph{procedural loot generation} system, are able to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions. The first is based on a dense embedding of the categorical input space that abstracts the discrete observation model and renders trained agents more able to generalize. The second proposed architecture is more general and is based on a Transformer network able to reason relationally about input and input attributes. Our experimental evaluation demonstrates that new agents have better adaptation capacity with respect to a baseline architecture, making this framework more robust to dynamic gameplay changes during development. Based on the results shown in this paper, we believe that these solutions represent a step forward towards making DRL more accessible to the gaming industry.

agent, architecture, policy network, (15 more...)

arXiv.org Artificial Intelligence

2012.03532

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Reinforcement Learning with Python Explained for Beginners

#artificialintelligenceDec-6-2020, 17:30:37 GMT

Reinforcement Learning (RL) possesses immense potential and is doubtless one of the most dynamic and stimulating fields of research in Artificial Intelligence. RL is considered as a game-changer in Data Science, particularly after observing the winnings of AI agents AlphaGo Zero and OpenAI Five against top human champions. However, RL is not restricted to games. The progress in Reinforcement Learning, especially during the last few years, has been sensational. RL is everywhere now, ranging from resource management to chemistry, from healthcare to finance, and from Recommender Systems to more advanced applications in stock prediction.

data science, python explained, reinforcement learning, (2 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.32)

Add feedback

Adaptive Stress Testing: Finding Likely Failure Events with Reinforcement Learning

Lee, Ritchie (Stinger Ghaffarian Technologies) | Mengshoel, Ole J. (Norwegian University of Science and Technology) | Saksena, Anshu (Johns Hopkins University Applied Physics Laboratory) | Gardner, Ryan W. (Johns Hopkins University Applied Physics Laboratory) | Genin, Daniel (Johns Hopkins University Applied Physics Laboratory) | Silbermann, Joshua | Owen, Michael (MIT Lincoln Laboratory) | Kochenderfer, Mykel J. (Stanford University)

Journal of Artificial Intelligence ResearchDec-6-2020

Finding the most likely path to a set of failure states is important to the analysis of safety-critical systems that operate over a sequence of time steps, such as aircraft collision avoidance systems and autonomous cars. In many applications such as autonomous driving, failures cannot be completely eliminated due to the complex stochastic environment in which the system operates. As a result, safety validation is not only concerned about whether a failure can occur, but also discovering which failures are most likely to occur. This article presents adaptive stress testing (AST), a framework for finding the most likely path to a failure event in simulation. We consider a general black box setting for partially observable and continuous-valued systems operating in an environment with stochastic disturbances. We formulate the problem as a Markov decision process and use reinforcement learning to optimize it. The approach is simulation-based and does not require internal knowledge of the system, making it suitable for black-box testing of large systems. We present different formulations depending on whether the state is fully observable or partially observable. In the latter case, we present a modified Monte Carlo tree search algorithm that only requires access to the pseudorandom number generator of the simulator to overcome partial observability. We also present an extension of the framework, called differential adaptive stress testing (DAST), that can find failures that occur in one system but not in another. This type of differential analysis is useful in applications such as regression testing, where we are concerned with finding areas of relative weakness compared to a baseline. We demonstrate the effectiveness of the approach on an aircraft collision avoidance application, where a prototype aircraft collision avoidance system is stress tested to find the most likely scenarios of near mid-air collision.

aircraft, collision avoidance system, simulator, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12190

AI Access Foundation

12190

Journal of Artificial Intelligence Research

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Lexington (0.04)
(4 more...)

Industry:

Transportation > Air (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Fever Basketball: A Complex, Flexible, and Asynchronized Sports Game Environment for Multi-agent Reinforcement Learning

Jia, Hangtian, Hu, Yujing, Chen, Yingfeng, Ren, Chunxu, Lv, Tangjie, Fan, Changjie, Zhang, Chongjie

arXiv.org Artificial IntelligenceDec-6-2020

The development of deep reinforcement learning (DRL) has benefited from the emergency of a variety type of game environments where new challenging problems are proposed and new algorithms can be tested safely and quickly, such as Board games, RTS, FPS, and MOBA games. However, many existing environments lack complexity and flexibility and assume the actions are synchronously executed in multi-agent settings, which become less valuable. We introduce the Fever Basketball game, a novel reinforcement learning environment where agents are trained to play basketball game. It is a complex and challenging environment that supports multiple characters, multiple positions, and both the single-agent and multi-agent player control modes. In addition, to better simulate real-world basketball games, the execution time of actions differs among players, which makes Fever Basketball a novel asynchronized environment. We evaluate commonly used multi-agent algorithms of both independent learners and joint-action learners in three game scenarios with varying difficulties, and heuristically propose two baseline methods to diminish the extra non-stationarity brought by asynchronism in Fever Basketball Benchmarks. Besides, we propose an integrated curricula training (ICT) framework to better handle Fever Basketball problems, which includes several game-rule based cascading curricula learners and a coordination curricula switcher focusing on enhancing coordination within the team. The results show that the game remains challenging and can be used as a benchmark environment for studies like long-time horizon, sparse rewards, credit assignment, and non-stationarity, etc. in multi-agent settings.

agent, basketball, fever basketball, (13 more...)

arXiv.org Artificial Intelligence

2012.03204

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Sports > Basketball (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Pinaki Laskar on LinkedIn: #DataScientist #ArtificialIntelligence #DataAnalytics

#artificialintelligenceDec-5-2020, 09:25:21 GMT

Can you teach a Machine intuition? It is what ML/DL/ANN after, as a supervised, unsupervised or reinforcement learning. For intuition is the ability to acquire knowledge without recourse to conscious reasoning, namely, - direct access to unconscious knowledge; - unconscious cognition; - inner sensing; - inner insight to unconscious pattern-recognition; and the ability to understand something instinctively, without any need for conscious reasoning.

artificialintelligence, dataanalytic, datascientist, (5 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Add feedback