Agents
Policy Shaping with Human Teachers
Cederborg, Thomas (Georgia Institute of Technology) | Grover, Ishaan (Georgia Institute of Technology) | Isbell, Charles L (Georgia Institute of Technology) | Thomaz, Andrea L (Georgia Institute of Technology)
In this work we evaluate the performance of a policy shaping algorithm using 26 human teachers. We examine if the algorithm is suitable for human-generated data on two different boards in a pac-man domain, comparing performance to an oracle that provides critique based on one known winning policy. Perhaps surprisingly, we show that the data generated by our 26 participants yields even better performance for the agent than data generated by the oracle. This might be because humans do not discourage exploring multiple winning policies. Additionally, we evaluate the impact of different verbal instructions, and different interpretations of silence, finding that the usefulness of data is affected both by what instructions is given to teachers, and how the data is interpreted.
Autonomous Cross-Domain Knowledge Transfer in Lifelong Policy Gradient Reinforcement Learning
Ammar, Haitham Bou (University of Pennsylvania) | Eaton, Eric (University of Pennsylvania) | Luna, Jose Marcio (University of Pennsylvania) | Ruvolo, Paul (Olin College of Engineering)
Online multi-task learning is an important capability for lifelong learning agents, enabling them to acquire models for diverse tasks over time and rapidly learn new tasks by building upon prior experience. However, recent progress toward lifelong reinforcement learning (RL) has been limited to learning from within a single task domain. For truly versatile lifelong learning, the agent must be able to autonomously transfer knowledge between different task domains. A few methods for cross-domain transfer have been developed, but these methods are computationally inefficient for scenarios where the agent must learn tasks consecutively. In this paper, we develop the first cross-domain lifelong RL framework. Our approach efficiently optimizes a shared repository of transferable knowledge and learns projection matrices that specialize that knowledge to different task domains. We provide rigorous theoretical guarantees on the stability of this approach, and empirically evaluate its performance on diverse dynamical systems. Our results show that the proposed method can learn effectively from interleaved task domains and rapidly acquire high performance in new domains.
A Complete Epistemic Planner without the Epistemic Closed World Assumption
Wan, Hai (Sun Yat-sen University) | Yang, Rui (Sun Yat-sen University) | Fang, Liangda (Sun Yat-sen University) | Liu, Yongmei (Sun Yat-sen University) | Xu, Huada (Sun Yat-sen University)
Planning with epistemic goals has received attention from both the dynamic logic and planning communities. In the single-agent case, under the epistemic closed-world assumption (ECWA), epistemic planning can be reduced to contingent planning. However, it is inappropriate to make the ECWA in some epistemic planning scenarios, for example, when the agent is not fully introspective, or when the agent wants to devise a generic plan that applies to a wide range of situations. In this paper, we propose a complete single-agent epistemic planner without the ECWA. We identify two normal forms of epistemic formulas: weak minimal epistemic DNF and weak minimal epistemic CNF, and present the progression and entailment algorithms based on these normal forms. We adapt the PrAO algorithm for contingent planning from the literature as the main planning algorithm and develop a complete epistemic planner called EPK. Our experimental results show that EPK can generate solutions effectively for most of the epistemic planning problems we have considered including those without the ECWA.
Trust-Sensitive Belief Revision
Hunter, Aaron (British Columbia Institute of Technology) | Booth, Richard (Mahasarakham University)
Belief revision is concerned with incorporating new information into a pre-existing set of beliefs. When the new information comes from another agent, we must first determine if that agent should be trusted. In this paper, we define trust as a pre-processing step before revision. We emphasize that trust in an agent is often restricted to a particular domain of expertise. We demonstrate that this form of trust can be captured by associating a state partition with each agent, then relativizing all reports to this partition before revising. We position the resulting family of trust-sensitive revision operators within the class of selective revision operators of Ferme and Hansson, and we examine its properties. In particular, we show how trust-sensitive revision is manipulable, in the sense that agents can sometimes have incentive to pass on misleading information. When multiple reporting agents are involved, we use a distance function over states to represent differing degrees of trust; this ensures that the most trusted reports will be believed.
Modelling the Persuadee in Asymmetric Argumentation Dialogues for Persuasion
Hunter, Anthony (University College London)
Computational models of argument could play a valuable role in persuasion technologies for behaviour change (e.g. persuading a user to eat a more healthy diet, or to drink less, or to take more exercise, or to study more conscientiously, etc). For this, the system (the persuader) could present arguments to convince the user (the persuadee). In this paper, we consider asymmetric dialogues where only the system presents arguments, and the system maintains a model of the user to determine the best choice of arguments to present (including counterarguments to key arguments believed to be held by the user). The focus of the paper is on the user model, including how we update it as the dialogue progresses, and how we use it to make optimal choices for dialogue moves.
Group Decision Making via Weighted Propositional Logic: Complexity and Islands of Tractability
Greco, Gianluigi (University of Calabria) | Lang, Jerome (Université Paris-Dauphine)
We study a general class of multiagent optimization problems, together with a compact representation language of utilities based on weighted propositional formulas. We seek solutions maximizing utilitarian social welfare as well as fair solutions maximizing the utility of the least happy agent. We show that many problems can be expressed in this setting, such as fair division of indivisible goods, some multiwinner elections, or multifacility location. We focus on the complexity of finding optimal solutions, and we identify the tractability boarder between polynomial and NP-hard settings, along several parameters: the syntax of formulas, the allowed weights, as well as the number of agents, propositional symbols, and formulas per agent.
Computing Social Behaviours Using Agent Models
Felli, Paolo (University of Melbourne) | Miller, Tim (University of Melbourne) | Muise, Christian (University of Melbourne) | Pearce, Adrian R. (University of Melbourne) | Sonenberg, Liz (University of Melbourne)
Agents can be thought of as following a social behaviour, depending on the context in which they are interacting. We devise a computationally grounded mechanism to represent and reason about others in social terms, reflecting the local perspective of an agent (first-person view), to support both stereotypical and empathetic reasoning. We use a hierarchy of agent models to discriminate which behaviours of others are plausible, and decide which behaviour for ourselves is socially acceptable, i.e. conforms to the social context. To this aim, we investigate the implications of considering agents capable of various degrees of theory of mind, and discuss a scenario showing how this affects behaviour.
On the Aggregation of Argumentation Frameworks
Delobelle, Jérôme (CRIL, CNRS – Université d'Artois, France) | Konieczny, Sébastien (CRIL, CNRS – Université d'Artois, France) | Vesic, Srdjan (CRIL, CNRS – Université d'Artois, France)
We study the problem of aggregation of Dung's abstract argumentation frameworks. Some operators for this aggregation have been proposed, as well as some rationality properties for this process. In this work we study the existing operators and new ones that we propose in light of the proposed properties, highlighting the fact that existing operators do not satisfy a lot of these properties. The conclusions are that on one hand none of the existing operators seem fully satisfactory, but on the other hand some of the properties proposed so far seem also too demanding.
Multilateral Negotiation in Boolean Games with Incomplete Information Using Generalized Possibilistic Logic
Clercq, Sofie De (Ghent University) | Schockaert, Steven (Cardiff University) | Nowé, Ann (Vrije Universiteit Brussel) | Cock, Martine De (University of Washington - Tacoma and Ghent University)
Boolean games are a game-theoretic framework in which propositional logic is used to describe agents’ goals. In this paper we investigate how agents in Boolean games can reach an efficient and fair outcome through a simple negotiation protocol. We are particularly interested in settings where agents only have incomplete knowledge about the preferences of others. After explaining how generalized possibilistic logic can be used to compactly encode such knowledge, we analyze how a lack of knowledge affects the agreement outcome. In particular, we show how knowledgeable agents can obtain a more desirable outcome than others.
Complexity Results in Epistemic Planning
Bolander, Thomas (Technical University of Denmark) | Jensen, Martin Holm (Technical University of Denmark) | Schwarzentruber, Francois (ENS Rennes)
Epistemic planning is a very expressive framework that extends automated planning by the incorporation of dynamic epistemic logic (DEL). We provide complexity results on the plan existence problem for multi-agent planning tasks, focusing on purely epistemic actions with propositional preconditions. We show that moving from epistemic preconditions to propositional preconditions makes it decidable, more precisely in EXPSPACE. The plan existence problem is PSPACE-complete when the underlying graphs are trees and NP-complete when they are chains (including singletons). We also show PSPACE-hardness of the plan verification problem, which strengthens previous results on the complexity of DEL model checking.