AITopics

1805.0988

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Indiana (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Discovering Blind Spots in Reinforcement Learning

Ramakrishnan, Ramya, Kamar, Ece, Dey, Debadeepta, Shah, Julie, Horvitz, Eric

Agents trained in simulation may make errors in the real world due to mismatches between training and execution environments. These mistakes can be dangerous and difficult to discover because the agent cannot predict them a priori. We propose using oracle feedback to learn a predictive model of these blind spots to reduce costly errors in real-world applications. We focus on blind spots in reinforcement learning (RL) that occur due to incomplete state representation: The agent does not have the appropriate features to represent the true state of the world and thus cannot distinguish among numerous states. We formalize the problem of discovering blind spots in RL as a noisy supervised learning problem with class imbalance. We learn models to predict blind spots in unseen regions of the state space by combining techniques for label aggregation, calibration, and supervised learning. The models take into consideration noise emerging from different forms of oracle feedback, including demonstrations and corrections. We evaluate our approach on two domains and show that it achieves higher predictive performance than baseline methods, and that the learned model can be used to selectively query an oracle at execution time to prevent errors. We also empirically analyze the biases of various feedback types and how they influence the discovery of blind spots.

agent, blind spot, oracle, (16 more...)

1805.08966

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Hoang, Trong Nghia, Hoang, Quang Minh, Low, Kian Hsiang, How, Jonathan

Collective Online Learning via Decentralized Gaussian Processes in Massive Multi-Agent Systems

arXiv.org Machine LearningMay-23-2018

Distributed machine learning (ML) is a modern computation paradigm that divides its workload into independent tasks that can be simultaneously achieved by multiple machines (i.e., agents) for better scalability. However, a typical distributed system is usually implemented with a central server that collects data statistics from multiple independent machines operating on different subsets of data to build a global analytic model. This centralized communication architecture however exposes a single choke point for operational failure and places severe bottlenecks on the server's communication and computation capacities as it has to process a growing volume of communication from a crowd of learning agents. To mitigate these bottlenecks, this paper introduces a novel Collective Online Learning Gaussian Process framework for massive distributed systems that allows each agent to build its local model, which can be exchanged and combined efficiently with others via peer-to-peer communication to converge on a global model of higher quality. Finally, our empirical results consistently demonstrate the efficiency of our framework on both synthetic and real-world datasets.

agent, artificial intelligence, representation, (15 more...)

arXiv.org Machine Learning

1805.09266

Country: North America (0.46)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.68)
Education > Educational Setting > Online (0.60)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Reinforcement Learning for Heterogeneous Teams with PALO Bounds

Ceren, Roi, Doshi, Prashant, He, Keyang

We introduce reinforcement learning for heterogeneous teams in which rewards for an agent are additively factored into local costs, stimuli unique to each agent, and global rewards, those shared by all agents in the domain. Motivating domains include coordination of varied robotic platforms, which incur different costs for the same action, but share an overall goal. We present two templates for learning in this setting with factored rewards: a generalization of Perkins' Monte Carlo exploring starts for POMDPs to canonical MPOMDPs, with a single policy mapping joint observations of all agents to joint actions (MCES-MP); and another with each agent individually mapping joint observations to their own action (MCES-FMP). We use probably approximately local optimal (PALO) bounds to analyze sample complexity, instantiating these templates to PALO learning. We promote sample efficiency by including a policy space pruning technique, and evaluate the approaches on three domains of heterogeneous agents demonstrating that MCES-FMP yields improved policies in less samples compared to MCES-MP and a previous benchmark.

agent, artificial intelligence, machine learning, (18 more...)

1805.09267

Country: North America > United States > Georgia > Clarke County > Athens (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Representation Balancing MDPs for Off-Policy Policy Evaluation

Liu, Yao, Gottesman, Omer, Raghu, Aniruddh, Komorowski, Matthieu, Faisal, Aldo, Doshi-Velez, Finale, Brunskill, Emma

We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in a common synthetic domain and on a challenging real-world sepsis management problem.

artificial intelligence, evaluation policy, machine learning, (17 more...)

1805.09044

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)

Jiang, Jiechuan, Lu, Zongqing

Learning Attentional Communication for Multi-Agent Cooperation

Communication could potentially be an effective way for multi-agent cooperation. However, information sharing among all agents or in predefined communication architectures that existing methods adopt can be problematic. When there is a large number of agents, agents hardly differentiate valuable information that helps cooperative decision making from globally shared information. Therefore, communication barely helps, and could even impair the learning of multi-agent cooperation. Predefined communication architectures, on the other hand, restrict communication among agents and thus restrain potential cooperation. To tackle these difficulties, in this paper, we propose an attentional communication model that learns when communication is needed and how to integrates shared information for cooperative decision making. Our model leads to efficient and effective communication for large-scale multi-agent cooperation. Empirically, we show the strength of our model in various cooperative scenarios, where agents are able to develop more coordinated and sophisticated strategies than existing methods.

artificial intelligence, deep learning, machine learning, (17 more...)

1805.07733

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Omidshafiei, Shayegan, Kim, Dong-Ki, Pazis, Jason, How, Jonathan P.

Crossmodal Attentive Skill Learner

arXiv.org Artificial IntelligenceMay-22-2018

This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic (A2OC) architecture [Harb et al., 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. We provide concrete examples where the approach not only improves performance in a single task, but accelerates transfer to new tasks. We demonstrate the attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. We modify the Arcade Learning Environment [Bellemare et al., 2013] to support audio queries, and conduct evaluations of crossmodal learning in the Atari 2600 game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017], we open-source a fast hybrid CPU-GPU implementation of CASL.

machine learning, natural language, reinforcement learning, (15 more...)

1711.10314

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.29)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
(3 more...)

Yudkowsky, Eliezer, Soares, Nate

Functional Decision Theory: A New Theory of Instrumental Rationality

arXiv.org Artificial IntelligenceMay-22-2018

This paper describes and motivates a new decision theory known as functional decision theory (FDT), as distinct from causal decision theory and evidential decision theory. Functional decision theorists hold that the normative principle for action is to treat one's decision as the output of a fixed mathematical function that answers the question, "Which output of this very function would yield the best outcome?" Adhering to this principle delivers a number of benefits, including the ability to maximize wealth in an array of traditional decision-theoretic and game-theoretic problems where CDT and EDT perform poorly. Using one simple and coherent decision rule, functional decision theorists (for example) achieve more utility than CDT on Newcomb's problem, more utility than EDT on the smoking lesion problem, and more utility than both in Parfit's hitchhiker problem. In this paper, we define FDT, explore its prescriptions in a number of different decision problems, compare it to CDT and EDT, and give philosophical justifications for FDT as a normative theory of decision-making.

agent, newcomb, predictor, (15 more...)

1710.0506

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Syria > Damascus Governorate > Damascus (0.05)
Asia > Middle East > Syria > Aleppo Governorate > Aleppo (0.05)
(12 more...)

Genre: Research Report (0.63)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.92)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
(2 more...)

arXiv.org Artificial IntelligenceMay-22-2018

Learning to Teach in Cooperative Multiagent Reinforcement Learning

Omidshafiei, Shayegan, Kim, Dong-Ki, Liu, Miao, Tesauro, Gerald, Riemer, Matthew, Amato, Christopher, Campbell, Murray, How, Jonathan P.

We present a framework and algorithm for peer-to-peer teaching in cooperative multiagent reinforcement learning. Our algorithm, Learning to Coordinate and Teach Reinforcement (LeCTR), trains advising policies by using students' learning progress as a teaching reward. Agents using LeCTR learn to assume the role of a teacher or student at the appropriate moments, exchanging action advice to accelerate the entire learning process. Our algorithm supports teaching heterogeneous teammates, advising under communication constraints, and learns both what and when to advise. LeCTR is demonstrated to outperform the final performance and rate of learning of prior teaching methods on multiple benchmark domains. To our knowledge, this is the first approach for learning to teach in a multiagent setting.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1805.0783

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

arXiv.org Artificial IntelligenceMay-21-2018

Scalable Centralized Deep Multi-Agent Reinforcement Learning via Policy Gradients

Khan, Arbaaz, Zhang, Clark, Lee, Daniel D., Kumar, Vijay, Ribeiro, Alejandro

In this paper, we explore using deep reinforcement learning for problems with multiple agents. Most existing methods for deep multi-agent reinforcement learning consider only a small number of agents. When the number of agents increases, the dimensionality of the input and control spaces increase as well, and these methods do not scale well. To address this, we propose casting the multi-agent reinforcement learning problem as a distributed optimization problem. Our algorithm assumes that for multi-agent settings, policies of individual agents in a given population live close to each other in parameter space and can be approximated by a single policy. With this simple assumption, we show our algorithm to be extremely effective for reinforcement learning in multi-agent settings. We demonstrate its effectiveness against existing comparable approaches on co-operative and competitive tasks.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

1805.08776

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.56)