AITopics

1909.12892

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Leisure & Entertainment > Games (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceSep-27-2019

A Generalized Training Approach for Multiagent Learning

Muller, Paul, Omidshafiei, Shayegan, Rowland, Mark, Tuyls, Karl, Perolat, Julien, Liu, Siqi, Hennes, Daniel, Marris, Luke, Lanctot, Marc, Hughes, Edward, Wang, Zhe, Lever, Guy, Heess, Nicolas, Graepel, Thore, Munos, Remi

This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-player zero-sum games, a regime wherein Nash equilibria are tractably computable. In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible. Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept, {\alpha}-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and tractable to compute in general-sum, many-player settings. We establish convergence guarantees in several games classes, and identify links between Nash equilibria and {\alpha}-Rank. We demonstrate the competitive performance of {\alpha}-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where {\alpha}-Rank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver. We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain.

oracle, psro, sink strongly-connected component, (13 more...)

1909.12823

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Leisure & Entertainment > Sports > Soccer (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Admiring the Great Mountain: A Celebration Special Issue in Honor of Stephen Grossbergs 80th Birthday

Wunsch, Donald C.

This editorial summarizes selected key contributions of Prof. Stephen Grossberg and describes the papers in this 80th birthday special issue in his honor. His productivity, creativity, and vision would each be enough to mark a scientist of the first caliber. In combination, they have resulted in contributions that have changed the entire discipline of neural networks. Grossberg has been tremendously influential in engineering, dynamical systems, and artificial intelligence as well. Indeed, he has been one of the most important mentors and role models in my career, and has done so with extraordinary generosity and encouragement. All authors in this special issue have taken great pleasure in hereby commemorating his extraordinary career and contributions.

grossberg, neural network, wunsch, (13 more...)

1910.13351

Country:

South America > Brazil (0.04)
North America > United States > Missouri (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Overview (0.66)
Research Report (0.64)
Collection > Journal > Special Issue (0.55)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Government (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Yoon, Jinsung, Arik, Sercan O., Pfister, Tomas

RL-LIM: Reinforcement Learning-based Locally Interpretable Modeling

arXiv.org Machine LearningSep-26-2019

Understanding black-box machine learning models is important towards their widespread adoption. However, developing globally interpretable models that explain the behavior of the entire model is challenging. An alternative approach is to explain black-box models through explaining individual prediction using a locally interpretable model. In this paper, we propose a novel method for locally interpretable modeling - Reinforcement Learning-based Locally Interpretable Modeling (RL-LIM). RL-LIM employs reinforcement learning to select a small number of samples and distill the black-box model prediction into a low-capacity locally interpretable model. Training is guided with a reward that is obtained directly by measuring agreement of the predictions from the locally interpretable model with the black-box model. RL-LIM near-matches the overall prediction performance of black-box models while yielding human-like interpretability, and significantly outperforms state of the art locally interpretable models in terms of overall prediction performance and fidelity.

black-box model, interpretable model, prediction, (15 more...)

arXiv.org Machine Learning

1909.12367

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Huang, Hua, Barbu, Adrian

Playing Atari Ball Games with Hierarchical Reinforcement Learning

Human beings are particularly good at reasoning and inference from just a few examples. When facing new tasks, humans will leverage knowledge and skills learned before, and quickly integrate them with the new task. In addition to learning by experimentation, human also learn socio-culturally through instructions and learning by example. In this way humans can learn much faster compared with most current artificial intelligence algorithms in many tasks. In this paper, we test the idea of speeding up machine learning through social learning. We argue that in solving real-world problems, especially when the task is designed by humans, and/or for humans, there are typically instructions from user manuals and/or human experts which give guidelines on how to better accomplish the tasks. We argue that these instructions have tremendous value in designing a reinforcement learning system which can learn in human fashion, and we test the idea by playing the Atari games Tennis and Pong. We experimentally demonstrate that the instructions provide key information about the task, which can be used to decompose the learning task into sub-systems and construct options for the temporally extended planning, and dramatically accelerate the learning process.

learning, reinforcement, reinforcement learning, (13 more...)

1909.12465

Country: North America > United States > Florida > Leon County > Tallahassee (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Leisure & Entertainment > Sports (0.91)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Tang, Zhiwen, Yang, Grace Hui

Dynamic Search -- Optimizing the Game of Information Seeking

This article presents the emerging topic of dynamic search (DS). To position dynamic search in a larger research landscape, the article discusses in detail its relationship to related research topics and disciplines. The article reviews approaches to modeling dynamics during information seeking, with an emphasis on Reinforcement Learning (RL)-enabled methods. Details are given for how different approaches are used to model interactions among the human user, the search system, and the environment. The paper ends with a review of evaluations of dynamic search systems.

information, learning, query, (15 more...)

1909.12425

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(4 more...)

Genre:

Research Report (1.00)
Workflow (0.67)

Industry:

Health & Medicine (1.00)
Media (0.92)
Leisure & Entertainment > Games (0.92)
(2 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(6 more...)

Penney, Drew D., Chen, Lizhong

A Survey of Machine Learning Applied to Computer Architecture Design

Machine learning has enabled significant benefits in diverse fields, but, with a few exceptions, has had limited impact on computer architecture. Recent work, however, has explored broader applicability for design, optimization, and simulation. Notably, machine learning based strategies often surpass prior state-of-the-art analytical, heuristic, and human-expert approaches. This paper reviews machine learning applied system-wide to simulation and run-time optimization, and in many individual components, including memory systems, branch predictors, networks-on-chip, and GPUs. The paper further analyzes current practice to highlight useful design strategies and identify areas for future work, based on optimized implementation strategies, opportune extensions to existing work, and ambitious long term possibilities. Taken together, these strategies and techniques present a promising future for increasingly automated architectural design.

application, international symposium, prediction, (15 more...)

1909.12373

Country:

Europe (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.92)

Industry:

Information Technology (1.00)
Energy (0.93)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture (1.00)
(6 more...)

James, Stephen, Ma, Zicong, Arrojo, David Rovick, Davison, Andrew J.

RLBench: The Robot Learning Benchmark & Learning Environment

Stephen James 1, Zicong Ma 2, David Rovick Arrojo 2, Andrew J. Davison 1 Abstract -- We present a challenging new benchmark and learning-environment for robot learning: RLBench. We provide an array of both proprioceptive observations and visual observations, which include rgb, depth, and segmentation masks from an over-the-shoulder stereo camera and an eye-in-hand monocular camera. Uniquely, each task comes with an infinite supply of demos through the use of motion planners operating on a series of waypoints given during task creation time; enabling an exciting flurry of demonstration-based learning. RLBench has been designed with scalability in mind; new tasks, along with their motion-planned demos, can be easily created and then verified by a series of tools, allowing users to submit their own tasks to the RLBench task repository. This large-scale benchmark aims to accelerate progress in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning. With the benchmark's breadth of tasks and demonstrations, we propose the first large-scale few-shot challenge in robotics. We hope that the scale and diversity of RLBench offers unparalleled research opportunities in the robot learning community and beyond.

arxiv preprint arxiv, benchmark, rlbench, (14 more...)

1909.12271

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry:

Education (0.71)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

Song, H. Francis, Abdolmaleki, Abbas, Springenberg, Jost Tobias, Clark, Aidan, Soyer, Hubert, Rae, Jack W., Noury, Seb, Ahuja, Arun, Liu, Siqi, Tirumala, Dhruva, Heess, Nicolas, Belov, Dan, Riedmiller, Martin, Botvinick, Matthew M.

Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradient algorithms, we introduce V-MPO, an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) that performs policy iteration based on a learned state-value function. We show that V-MPO surpasses previously reported scores for both the Atari-57 and DMLab-30 benchmark suites in the multi-task setting, and does so reliably without importance weighting, entropy regularization, or population-based tuning of hyperparameters. On individual DMLab and Atari levels, the proposed algorithm can achieve scores that are substantially higher than has previously been reported. V-MPO is also applicable to problems with high-dimensional, continuous action spaces, which we demonstrate in the context of learning to control simulated humanoids with 22 degrees of freedom from full state observations and 56 degrees of freedom from pixel observations, as well as example OpenAI Gym tasks where V-MPO achieves substantially higher asymptotic scores than previously reported.

algorithm, arxiv preprint, constraint, (13 more...)

1909.12238

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)

Genre: Research Report (0.52)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Asli, AE. Niaraki, Jannesari, A.

A Simulation of UAV Power Optimization via Reinforcement Learning

This paper demonstrates a reinforcement learning approach to the optimization of power consumption in a UAV system in a simplified data collection task. Here, the architecture consists of two common reinforcement learning algorithms, Q-learning and Sarsa, which are implemented through a combination of robot operating system (ROS) and Gazebo. The effect of wind as an influential factor was simulated. The implemented algorithm resulted in reasonable adjustment of UAV actions to the wind field in order to minimize its power consumption during task completion over the domain.

algorithm, objective function, wind field, (11 more...)

1909.12217

Country: North America > United States > Iowa > Story County > Ames (0.04)

Genre: Research Report (0.50)

Industry:

Food & Agriculture > Agriculture (0.70)
Energy > Renewable > Wind (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.50)