AITopics

1309.7407

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.68)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.66)

Mazumdar, Eric, Ratliff, Lillian J., Jordan, Michael I., Sastry, S. Shankar

Policy-Gradient Algorithms Have No Guarantees of Convergence in Continuous Action and State Multi-Agent Settings

arXiv.org Machine LearningJul-8-2019

We show by counterexample that policy-gradient algorithms have no guarantees of even local convergence to Nash equilibria in continuous action and state space multi-agent settings. To do so, we analyze gradient-play in $N$-player general-sum linear quadratic games. In such games the state and action spaces are continuous and the unique global Nash equilibrium can be found be solving coupled Ricatti equations. Further, gradient-play in LQ games is equivalent to multi-agent policy gradient. We first prove that the only critical point of the gradient dynamics in these games is the unique global Nash equilibrium. We then give sufficient conditions under which policy gradient will avoid the Nash equilibrium, and generate a large number of general-sum linear quadratic games that satisfy these conditions. The existence of such games indicates that one of the most popular approaches to solving reinforcement learning problems in the classic reinforcement learning setting has no guarantee of convergence in multi-agent settings. Further, the ease with which we can generate these counterexamples suggests that such situations are not mere edge cases and are in fact quite common.

lq game, nash equilibrium, policy gradient, (14 more...)

1907.03712

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Washington > King County > Seattle (0.14)
Asia > Middle East > Jordan (0.05)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Canaan, Rodrigo, Togelius, Julian, Nealen, Andy, Menzel, Stefan

Diverse Agents for Ad-Hoc Cooperation in Hanabi

arXiv.org Artificial IntelligenceJul-8-2019

In complex scenarios where a model of other actors is necessary to predict and interpret their actions, it is often desirable that the model works well with a wide variety of previously unknown actors. Hanabi is a card game that brings the problem of modeling other players to the forefront, but there is no agreement on how to best generate a pool of agents to use as partners in ad-hoc cooperation evaluation. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate populations for this purpose and shows an initial implementation of an agent generator based on this idea. We also discuss what metrics can be used to compare such generators, and how the proposed generator could be leveraged to help build adaptive agents for the game.

agent, artificial intelligence, machine learning, (17 more...)

1907.0384

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJul-7-2019

On Conforming and Conflicting Values

Chhogyal, Kinzang, Nayak, Abhaya, Ghose, Aditya, Orgun, Mehmet, Dam, Hoa

Values are things that are important to us. Actions activate values - they either go against our values or they promote our values. Values themselves can either be conforming or conflicting depending on the action that is taken. In this short paper, we argue that values may be classified as one of two types - conflicting and inherently conflicting values. They are distinguished by the fact that the latter in some sense can be thought of as being independent of actions. This allows us to do two things: i) check whether a set of values is consistent and ii) check whether it is in conflict with other sets of values.

artificial intelligence, conflict, value state, (13 more...)

1907.01682

Country: Oceania > Australia > New South Wales (0.14)

Genre: Research Report (0.40)

Industry: Government (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.96)

Bar-On, Yogev, Mansour, Yishay

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

arXiv.org Machine LearningJul-7-2019

We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret minimization algorithms that guarantee for each agent $v$ an individual expected regret of \[ \widetilde{O}\left(\sqrt{\left(1+\frac{K}{\left|\mathcal{N}\left(v\right)\right|}\right)T}\right), \] where $T$ is the number of time steps, $K$ is the number of actions and $\mathcal{N}\left(v\right)$ is the set of neighbors of agent $v$ in the communication graph. We present algorithms both for the case that the communication graph is known to all the agents, and for the case that the graph is unknown. When the graph is unknown, each agent knows only the set of its neighbors and an upper bound on the total number of agents. The individual regret between the models differs only by a logarithmic factor. Our work resolves an open problem from [Cesa-Bianchi et al., 2019b].

artificial intelligence, big data, data mining, (20 more...)

1907.03346

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

arXiv.org Machine LearningJul-5-2019

A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning

Lin, Yixuan, Zhang, Kaiqing, Yang, Zhuoran, Wang, Zhaoran, Başar, Tamer, Sandhu, Romeil, Liu, Ji

Recently, there has been increasing interest in developing distributed machine learning algorithms. Notable examples include distributed linear regression [1], multi-arm bandit [2], reinforcement learning (RL) [3], and deep learning [4]. Such algorithms have promising applications in large-scale networks, such as social platforms, online economic networks, cyber-physical systems, and Internet of Things, primarily because in such a complex network, it is impossible to collect all the information at the same point and each component of the network may not be willing to share its private information due to privacy issues. Multi-agent reinforcement learning (MARL) problems have recently received increasing attention. In general, MARL problems are investigated in settings that are either collaborative, competitive, or a mixture of the two. For collaborative MARL, the most rudimentary framework is the canonical multi-agent Markov decision process [5, 6], where the agents share a common reward function that is determined by the joint actions of all agents. Another notable framework for collaborative MARL is the team Markov game model, also with a shared reward function among agents [7, 8]. These two frameworks were then extended to the setting where agents are allowed to have heterogeneous reward functions[3,9-12], collaborating with the goal of maximizing the long-term return corresponding to the team averaged reward.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1907.03053

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > Illinois (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Zhu, Jichen, Ontañón, Santiago

Experience Management in Multi-player Games

arXiv.org Artificial IntelligenceJul-4-2019

Experience Management studies AI systems that automatically adapt interactive experiences such as games to tailor to specific players and to fulfill design goals. Although it has been explored for several decades, existing work in experience management has mostly focused on single-player experiences. This paper is a first attempt at identifying the main challenges to expand EM to multi-player/multi-user games or experiences. We also make connections to related areas where solutions for similar problems have been proposed (especially group recommender systems) and discusses the potential impact and applications of multi-player EM.

aggregation, modeling, player modeling, (13 more...)

1907.02349

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Games (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.89)

Elie, Romuald, Pérolat, Julien, Laurière, Mathieu, Geist, Matthieu, Pietquin, Olivier

Approximate Fictitious Play for Mean Field Games

arXiv.org Machine LearningJul-4-2019

The theory of Mean Field Games (MFG) allows characterizing the Nash equilibria of an infinite number of identical players, and provides a convenient and relevant mathematical framework for the study of games with a large number of agents in interaction. Until very recently, the literature only considered Nash equilibria between fully informed players. In this paper, we focus on the realistic setting where agents with no prior information on the game learn their best response policy through repeated experience. We study the convergence to a (possibly approximate) Nash equilibrium of a fictitious play iterative learning scheme where the best response is approximately computed, typically by a reinforcement learning (RL) algorithm. Notably, we show for the first time convergence of model free learning algorithms towards non-stationary MFG equilibria, relying only on classical assumptions on the MFG dynamics. We illustrate our theoretical results with a numerical experiment in continuous action-space setting, where the best response of the iterative fictitious play scheme is computed with a deep RL algorithm.

equilibrium, machine learning, reinforcement learning, (19 more...)

1907.02633

Genre: Research Report (0.40)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Hu, Hsu-Chieh, Smith, Stephen F.

Using Bi-Directional Information Exchange to Improve Decentralized Schedule-Driven Traffic Control

arXiv.org Artificial IntelligenceJul-3-2019

Recent work in decentralized, schedule-driven traffic control has demonstrated the ability to improve the efficiency of traffic flow in complex urban road networks. In this approach, a scheduling agent is associated with each intersection. Each agent senses the traffic approaching its intersection and in real-time constructs a schedule that minimizes the cumulative wait time of vehicles approaching the intersection over the current look-ahead horizon. In order to achieve network level coordination in a scalable manner, scheduling agents communicate only with their direct neighbors. Each time an agent generates a new intersection schedule it communicates its expected outflows to its downstream neighbors as a prediction of future demand and these outflows are appended to the downstream agent's locally perceived demand. In this paper, we extend this basic coordination algorithm to additionally incorporate the complementary flow of information reflective of an intersection's current congestion level to its upstream neighbors. We present an asynchronous decentralized algorithm for updating intersection schedules and congestion level estimates based on these bi-directional information flows. By relating this algorithm to the self-optimized decision making of the basic operation, we are able to approach network-wide optimality and reduce inefficiency due to strictly self-interested intersection control decisions.

artificial intelligence, information, intersection, (18 more...)

1907.01978

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Parisi, German I., Kanan, Christopher

Rethinking Continual Learning for Autonomous Agents and Robots

arXiv.org Artificial IntelligenceJul-2-2019

Continual learning refers to the ability of a biological or artificial system to seamlessly learn from continuous streams of information while preventing catastrophic forgetting, i.e., a condition in which new incoming information strongly interferes with previously learned representations. Since it is unrealistic to provide artificial agents with all the necessary prior knowledge to effectively operate in real-world conditions, they must exhibit a rich set of learning capabilities enabling them to interact in complex environments with the aim to process and make sense of continuous streams of (often uncertain) information. While the vast majority of continual learning models are designed to alleviate catastrophic forgetting on simplified classification tasks, here we focus on continual learning for autonomous agents and robots required to operate in much more challenging experimental settings. In particular, we discuss well-established biological learning factors such as developmental and curriculum learning, transfer learning, and intrinsic motivation and their computational counterparts for modeling the progressive acquisition of increasingly complex knowledge and skills in a continual fashion.

artificial intelligence, learning, machine learning, (12 more...)

1907.01929

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Florida > Miami-Dade County > Miami Beach (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.61)