Goto

Collaborating Authors

 Agents


Optimizing Interventions via Offline Policy Evaluation: Studies in Citizen Science

AAAI Conferences

Volunteers who help with online crowdsourcing such as citizen science tasks typically make only a few contributions before exiting. We propose a computational approach for increasing users' engagement in such settings that is based on optimizing policies for displaying motivational messages to users. The approach, which we refer to as Trajectory Corrected Intervention (TCI), reasons about the tradeoff between the long-term influence of engagement messages on participants' contributions and the potential risk of disrupting their current work. We combine model-based reinforcement learning with off-line policy evaluation to generate intervention policies, without relying on a fixed representation of the domain. TCI works iteratively to learn the best representation from a set of random intervention trials and to generate candidate intervention policies. It is able to refine selected policies off-line by exploiting the fact that users can only be interrupted once per session.We implemented TCI in the wild with Galaxy Zoo, one of the largest citizen science platforms on the web. We found that TCI was able to outperform the state-of-the-art intervention policy for this domain, and significantly increased the contributions of thousands of users. This work demonstrates the benefit of combining traditional AI planning with off-line policy methods to generate intelligent intervention strategies.


How AI Wins Friends and Influences People in Repeated Games With Cheap Talk

AAAI Conferences

Research has shown that a person's financial success is more dependent on the ability to deal with people than on professional knowledge. Sage advice, such as "if you can't say something nice, don't say anything at all" and principles articulated in Carnegie's classic "How to Win Friends and Influence People," offer trusted rules-of-thumb for how people can successfully deal with each other. However, alternative philosophies for dealing with people have also emerged. The success of an AI system is likewise contingent on its ability to win friends and influence people. In this paper, we study how AI systems should be designed to win friends and influence people in repeated games with cheap talk (RGCTs). We create several algorithms for playing RGCTs by combining existing behavioral strategies (what the AI does) with signaling strategies (what the AI says) derived from several competing philosophies. Via user study, we evaluate these algorithms in four RGCTs. Our results suggest sufficient properties for AIs to win friends and influence people in RGCTs.


Emergence of Grounded Compositional Language in Multi-Agent Populations

AAAI Conferences

By capturing statistical patterns in large corpora, machine learning has enabled significant advances in natural language processing, including in machine translation, question answering, and sentiment analysis. However, for agents to intelligently interact with humans, simply capturing the statistical patterns is insufficient. In this paper we investigate if, and how, grounded compositional language can emerge as a means to achieve goals in multi-agent populations. Towards this end, we propose a multi-agent learning environment and learning methods that bring about emergence of a basic compositional language. This language is represented as streams of abstract discrete symbols uttered by agents over time, but nonetheless has a coherent structure that possesses a defined vocabulary and syntax. We also observe emergence of non-verbal communication such as pointing and guiding when language communication is unavailable.


Interactively Learning a Blend of Goal-Based and Procedural Tasks

AAAI Conferences

Agents that can learn new tasks through interactive instruction can utilize goal information to search for and learn flexible policies. This approach can be resilient to variations in initial conditions or issues that arise during execution. However, if a task is not easily formulated as achieving a goal or if the agent lacks sufficient domain knowledge for planning, other methods are required. We present a hybrid approach to interactive task learning that can learn both goal-oriented and procedural tasks, and mixtures of the two, from human natural language instruction. We describe this approach, go through two examples of learning tasks, and outline the space of tasks that the system can learn. We show that our approach can learn a variety of goal-oriented and procedural tasks from a single example and is robust to different amounts of domain knowledge.


A Regression Approach for Modeling Games With Many Symmetric Players

AAAI Conferences

We exploit player symmetry to formulate the representation of large normal-form games as a regression task. This formulation allows arbitrary regression methods to be employed in in estimating utility functions from a small subset of the game's outcomes. We demonstrate the applicability both neural networks and Gaussian process regression, but focus on the latter. Once utility functions are learned, computing Nash equilibria requires estimating expected payoffs of pure-strategy deviations from mixed-strategy profiles. Computing these expectations exactly requires an infeasible sum over the full payoff matrix, so we propose and test several approximation methods. Three of these are simple and generic, applicable to any regression method and games with any number of player roles. However, the best performance is achieved by a continuous integral that approximates the summation, which we formulate for the specific case of fully-symmetric games learned by Gaussian process regression with a radial basis function kernel. We demonstrate experimentally that the combination of learned utility functions and expected payoff estimation allows us to efficiently identify approximate equilibria of large games using sparse payoff data.


Rich Coalitional Resource Games

AAAI Conferences

We propose a simple model of interaction for resource-conscious agents. The resources involved are expressed in fragments of Linear Logic. We investigate a few problems relevant to cooperative games, such as deciding whether a group of agents can form a coalition and act together in a way that satisfies all of them. In terms of solution concepts, we study the computational aspects of the core of a game. The main contributions are a formal link with the existing literature, and complexity results for several classes of models.


Modelling Iterative Judgment Aggregation

AAAI Conferences

We introduce a formal model of iterative judgment aggregation, enabling the analysis of scenarios in which agents repeatedly update their individual positions on a set of issues, before a final decision is made by applying an aggregation rule to these individual positions. Focusing on two popular aggregation rules, the premise-based rule and the plurality rule, we study under what circumstances convergence to an equilibrium can be guaranteed. We also analyse the quality, in social terms, of the final decisions obtained. Our results not only shed light on the parameters that determine whether iteration converges and is socially beneficial, but they also clarify important differences between iterative judgment aggregation and the related framework of iterative voting.


Non-Exploitable Protocols for Repeated Cake Cutting

AAAI Conferences

We introduce the notion of exploitability in cut-and-choose protocols for repeated cake cutting. If a cut-and-choose protocol is repeated, the cutter can possibly gain information about the chooser from her previous actions, and exploit this information for her own gain, at the expense of the chooser. We define a generalization of cut-and-choose protocols - forced-cut protocols - in which some cuts are made exogenously while others are made by the cutter, and show that there exist non-exploitable forced-cut protocols that use a small number of cuts per day: When the cake has at least as many dimensions as days, we show a protocol that uses a single cut per day. When the cake is 1-dimensional, we show an adaptive non-exploitable protocol that uses 3 cuts per day, and a non-adaptive protocol that uses n cuts per day (where n is the number of days). In contrast, we show that no non-adaptive non-exploitable forced-cut protocol can use a constant number of cuts per day. Finally, we show that if the cake is at least 2-dimensional, there is a non-adaptive non-exploitable protocol that uses 3 cuts per day.


Coalition Manipulation of Gale-Shapley Algorithm

AAAI Conferences

It is well-known that the Gale-Shapley algorithm is not truthful for all agents. Previous studies in this category concentrate on manipulations using incomplete preference lists by a single woman and by the set of all women. Little is known about manipulations by a subset of women. In this paper, we consider manipulations by any subset of women with arbitrary preferences. We show that a strong Nash equilibrium of the induced manipulation game always exists among the manipulators and the equilibrium outcome is unique and Pareto-dominant. In addition, the set of matchings achievable by manipulations has a lattice structure. We also examine the super-strong Nash equilibrium in the end.


Traffic Optimization for a Mixture of Self-Interested and Compliant Agents

AAAI Conferences

This paper focuses on two commonly used path assignment policies for agents traversing a congested network: self-interested routing, and system-optimum routing. In the self-interested routing policy each agent selects a path that optimizes its own utility, while in the system-optimum routing, agents are assigned paths with the goal of maximizing system performance. This paper considers a scenario where a centralized network manager wishes to optimize utilities over all agents, i.e., implement a system-optimum routing policy. In many real-life scenarios, however, the system manager is unable to influence the route assignment of all agents due to limited influence on route choice decisions. Motivated by such scenarios, a computationally tractable method is presented that computes the minimal amount of agents that the system manager needs to influence (compliant agents) in order to achieve system optimal performance. Moreover, this methodology can also determine whether a given set of compliant agents is sufficient to achieve system optimum and compute the optimal route assignment for the compliant agents to do so. Experimental results are presented showing that in several large-scale, realistic traffic networks optimal flow can be achieved with as low as 13% of the agent being compliant and up to 54%.