AITopics | Agents

Collaborating Authors

Agents

News Overviews Instructional Materials AI-Alerts Classics

The NetHack Learning Environment

Küttler, Heinrich, Nardelli, Nantas, Miller, Alexander H., Raileanu, Roberta, Selvatici, Marco, Grefenstette, Edward, Rocktäschel, Tim

arXiv.org Artificial IntelligenceJun-24-2020

Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack. We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience. We compare NLE and its task suite to existing alternatives, and discuss why it is an ideal medium for testing the robustness and systematic generalization of RL agents. We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration, alongside qualitative analysis of various agents trained in the environment. NLE is open source at https://github.com/facebookresearch/nle.

machine learning, natural language, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2006.1376

Country:

Europe > Sweden > Skåne County > Malmö (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Preferences Single-Peaked on a Circle

Peters, Dominik | Lackner, Martin (TU Wien)

Journal of Artificial Intelligence ResearchJun-24-2020

We introduce the domain of preferences that are single-peaked on a circle, which is a generalization of the well-studied single-peaked domain. This preference restriction is useful, e.g., for scheduling decisions, certain facility location problems, and for one-dimensional decisions in the presence of extremist preferences. We give a fast recognition algorithm of this domain, provide a characterisation by finitely many forbidden subprofiles, and show that many popular single- and multi-winner voting rules are polynomial-time computable on this domain. In particular, we prove that Proportional Approval Voting can be computed in polynomial time for profiles that are single-peaked on a circle. In contrast, Kemeny's rule remains hard to evaluate, and several impossibility results from social choice theory can be proved using only profiles in this domain.

algorithm, artificial intelligence, linear order, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11732

AI Access Foundation

11732

Journal of Artificial Intelligence Research

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.45)

Industry: Government > Voting & Elections (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Towards Understanding Linear Value Decomposition in Cooperative Multi-Agent Q-Learning

Wang, Jianhao, Ren, Zhizhou, Han, Beining, Zhang, Chongjie

arXiv.org Artificial IntelligenceJun-23-2020

Value decomposition is a popular and promising approach to scaling up multi-agent reinforcement learning in cooperative settings. However, the theoretical understanding of such methods is limited. In this paper, we introduce a variant of the fitted Q-iteration framework for analyzing multi-agent Q-learning with value decomposition. Based on this framework, we derive a closed-form solution to the Bellman error minimization with linear value decomposition. With this novel solution, we further reveal two interesting insights: 1) linear value decomposition implicitly implements a classical multi-agent credit assignment called counterfactual difference rewards; and 2) multi-agent Q-learning with linear value decomposition requires on-policy data distribution to achieve numerical stability. In the empirical study, our experiments demonstrate the realizability of our theoretical implications in a broad set of complicated tasks. They show that most state-of-the-art deep multi-agent Q-learning algorithms using linear value decomposition cannot efficiently utilize off-policy samples, which may even lead to an unbounded divergence.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2006.00587

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Envy-freeness up to one item: Shall we add or remove resources?

Aleksandrov, Martin

arXiv.org Artificial IntelligenceJun-23-2020

We consider a fair division model in which agents have general valuations for bundles of indivisible items. We propose two new axiomatic properties for allocations in this model: EF1+- and EFX+-. We compare these with the existing EF1 and EFX. Although EF1 and EF1+- allocations often exist, our results assert eloquently that EFX+- and PO allocations exist in each case where EFX and PO allocations do not exist. Additionally, we prove several new impossibility and incompatibility results.

allocation, artificial intelligence, valuation, (16 more...)

arXiv.org Artificial Intelligence

2006.11312

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Tian, Yi, Qian, Jian, Sra, Suvrit

arXiv.org Machine LearningJun-23-2020

We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure-dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2006.13405

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Workflow (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.81)

Add feedback

Fanoos: Multi-Resolution, Multi-Strength, Interactive Explanations for Learned Systems

Bayani, David, Mitsch, Stefan

arXiv.org Artificial IntelligenceJun-22-2020

Machine learning becomes increasingly important to tune or even synthesize the behavior of safety-critical components in highly non-trivial environments, where the inability to understand learned components in general, and neural nets in particular, poses serious obstacles to their adoption. Explainability and interpretability methods for learned systems have gained considerable academic attention, but the focus of current approaches on only one aspect of explanation, at a fixed level of abstraction, and limited if any formal guarantees, prevents those explanations from being digestible by the relevant stakeholders (e.g., end users, certification authorities, engineers) with their diverse backgrounds and situation-specific needs. We introduce Fanoos, a flexible framework for combining formal verification techniques, heuristic search, and user interaction to explore explanations at the desired level of granularity and fidelity. We demonstrate the ability of Fanoos to produce and adjust the abstractness of explanations in response to user requests on a learned controller for an inverted double pendulum and on a learned CPU usage model.

machine learning, natural language, predicate, (19 more...)

arXiv.org Artificial Intelligence

2006.12453

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Maryland > Baltimore (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law (0.67)
Information Technology > Security & Privacy (0.67)
Government > Military (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
(2 more...)

Add feedback

Evolutionary Processes in Quantum Decision Theory

Yukalov, V. I.

arXiv.org Artificial IntelligenceJun-22-2020

In recent years, there has appeared high interest to the possibility of formulating decision theory in the language of quantum mechanics. Numerous references on this topic can be found in the books [1-4] and review articles [5-8]. This interest is caused by the inability of classical decision theory [9] to comply with the behaviour of real decision makers, which requires to develop other approaches. Resorting to the techniques of quantum theory gives hopes for a better representation of behavioral decision making. There are several variants of using quantum mechanics for interpreting conscious effects.

artificial intelligence, decision support system, probability, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/e22060681

2006.16763

Country:

North America > United States > New York (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts (0.04)
(8 more...)

Genre:

Overview (0.86)
Research Report (0.82)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)

Add feedback

Emergent cooperation through mutual information maximization

Cuervo, Santiago, Alzate, Marco

arXiv.org Artificial IntelligenceJun-21-2020

With artificial intelligence systems becoming ubiquitous in our society, its designers will soon have to start to consider its social dimension, as many of these systems will have to interact among them to work efficiently. With this in mind, we propose a decentralized deep reinforcement learning algorithm for the design of cooperative multi-agent systems. The algorithm is based on the hypothesis that highly correlated actions are a feature of cooperative systems, and hence, we propose the insertion of an auxiliary objective of maximization of the mutual information between the actions of agents in the learning problem. Our system is applied to a social dilemma, a problem whose optimal solution requires that agents cooperate to maximize a macroscopic performance function despite the divergent individual objectives of each agent. By comparing the performance of the proposed system to a system without the auxiliary objective, we conclude that the maximization of mutual information among agents promotes the emergence of cooperation in social dilemmas.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2006.11769

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning

Wang, Lingxiao, Yang, Zhuoran, Wang, Zhaoran

arXiv.org Machine LearningJun-21-2020

Multi-agent reinforcement learning (MARL) achieves significant empirical successes. However, MARL suffers from the curse of many agents. In this paper, we exploit the symmetry of agents in MARL. In the most generic form, we study a mean-field MARL problem. Such a mean-field MARL is defined on mean-field states, which are distributions that are supported on continuous space. Based on the mean embedding of the distributions, we propose MF-FQI algorithm that solves the mean-field MARL and establishes a non-asymptotic analysis for MF-FQI algorithm. We highlight that MF-FQI algorithm enjoys a "blessing of many agents" property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2006.11917

Country:

North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning

Liu, Sijia, Chen, Pin-Yu, Kailkhura, Bhavya, Zhang, Gaoyuan, Hero, Alfred, Varshney, Pramod K.

arXiv.org Machine LearningJun-21-2020

Zeroth-order (ZO) optimization is a subset of gradient-free optimization that emerges in many signal processing and machine learning applications. It is used for solving optimization problems similarly to gradient-based methods. However, it does not require the gradient, using only function evaluations. Specifically, ZO optimization iteratively performs three major steps: gradient estimation, descent direction computation, and solution update. In this paper, we provide a comprehensive review of ZO optimization, with an emphasis on showing the underlying intuition, optimization principles and recent advances in convergence analysis. Moreover, we demonstrate promising applications of ZO optimization, such as evaluating robustness and generating explanations from black-box deep learning models, and efficient online sensor management.

artificial intelligence, machine learning, optimization, (19 more...)

arXiv.org Machine Learning

2006.06224

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Overview (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
(2 more...)

Add feedback