Goto

Collaborating Authors

 Crandall, Jacob W.


E-HBA: Using Action Policies for Expert Advice and Agent Typification

arXiv.org Artificial Intelligence

Past research has studied two approaches to utilise predefined policy sets in repeated interactions: as experts, to dictate our own actions, and as types, to characterise the behaviour of other agents. In this work, we bring these complementary views together in the form of a novel meta-algorithm, called Expert-HBA (E-HBA), which can be applied to any expert algorithm that considers the average (or total) payoff an expert has yielded in the past. E-HBA gradually mixes the past payoff with a predicted future payoff, which is computed using the type-based characterisation. We present results from a comprehensive set of repeated matrix games, comparing the performance of several well-known expert algorithms with and without the aid of E-HBA. Our results show that E-HBA has the potential to significantly improve the performance of expert algorithms.


An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types

arXiv.org Artificial Intelligence

Many multiagent applications require an agent to learn quickly how to interact with previously unknown other agents. To address this problem, researchers have studied learning algorithms which compute posterior beliefs over a hypothesised set of policies, based on the observed actions of the other agents. The posterior belief is complemented by the prior belief, which specifies the subjective likelihood of policies before any actions are observed. In this paper, we present the first comprehensive empirical study on the practical impact of prior beliefs over policies in repeated interactions. We show that prior beliefs can have a significant impact on the long-term performance of such methods, and that the magnitude of the impact depends on the depth of the planning horizon. Moreover, our results demonstrate that automatic methods can be used to compute prior beliefs with consistent performance effects. This indicates that prior beliefs could be eliminated as a manual parameter and instead be computed automatically.


How AI Wins Friends and Influences People in Repeated Games With Cheap Talk

AAAI Conferences

Research has shown that a person's financial success is more dependent on the ability to deal with people than on professional knowledge. Sage advice, such as "if you can't say something nice, don't say anything at all" and principles articulated in Carnegie's classic "How to Win Friends and Influence People," offer trusted rules-of-thumb for how people can successfully deal with each other. However, alternative philosophies for dealing with people have also emerged. The success of an AI system is likewise contingent on its ability to win friends and influence people. In this paper, we study how AI systems should be designed to win friends and influence people in repeated games with cheap talk (RGCTs). We create several algorithms for playing RGCTs by combining existing behavioral strategies (what the AI does) with signaling strategies (what the AI says) derived from several competing philosophies. Via user study, we evaluate these algorithms in four RGCTs. Our results suggest sufficient properties for AIs to win friends and influence people in RGCTs.


Non-myopic learning in repeated stochastic games

arXiv.org Artificial Intelligence

In repeated stochastic games (RSGs), an agent must quickly adapt to the behavior of previously unknown associates, who may themselves be learning. This machine-learning problem is particularly challenging due, in part, to the presence of multiple (even infinite) equilibria and inherently large strategy spaces. In this paper, we introduce a method to reduce the strategy space of two-player general-sum RSGs to a handful of expert strategies. This process, called Mega, effectually reduces an RSG to a bandit problem. We show that the resulting strategy space preserves several important properties of the original RSG, thus enabling a learner to produce robust strategies within a reasonably small number of interactions. To better establish strengths and weaknesses of this approach, we empirically evaluate the resulting learning system against other algorithms in three different RSGs.


Cooperating with Machines

arXiv.org Artificial Intelligence

Since Alan Turing envisioned Artificial Intelligence (AI) [1], a major driving force behind technical progress has been competition with human cognition. Historical milestones have been frequently associated with computers matching or outperforming humans in difficult cognitive tasks (e.g. face recognition [2], personality classification [3], driving cars [4], or playing video games [5]), or defeating humans in strategic zero-sum encounters (e.g. Chess [6], Checkers [7], Jeopardy! [8], Poker [9], or Go [10]). In contrast, less attention has been given to developing autonomous machines that establish mutually cooperative relationships with people who may not share the machine's preferences. A main challenge has been that human cooperation does not require sheer computational power, but rather relies on intuition [11], cultural norms [12], emotions and signals [13, 14, 15, 16], and pre-evolved dispositions toward cooperation [17], common-sense mechanisms that are difficult to encode in machines for arbitrary contexts. Here, we combine a state-of-the-art machine-learning algorithm with novel mechanisms for generating and acting on signals to produce a new learning algorithm that cooperates with people and other machines at levels that rival human cooperation in a variety of two-player repeated stochastic games. This is the first general-purpose algorithm that is capable, given a description of a previously unseen game environment, of learning to cooperate with people within short timescales in scenarios previously unanticipated by algorithm designers. This is achieved without complex opponent modeling or higher-order theories of mind, thus showing that flexible, fast, and general human-machine cooperation is computationally achievable using a non-trivial, but ultimately simple, set of algorithmic mechanisms.


An Online Mechanism for Ridesharing in Autonomous Mobility-on-Demand Systems

arXiv.org Artificial Intelligence

With proper management, Autonomous Mobility-on-Demand (AMoD) systems have great potential to satisfy the transport demands of urban populations by providing safe, convenient, and affordable ridesharing services. Meanwhile, such systems can substantially decrease private car ownership and use, and thus significantly reduce traffic congestion, energy consumption, and carbon emissions. To achieve this objective, an AMoD system requires private information about the demand from passengers. However, due to self-interestedness, passengers are unlikely to cooperate with the service providers in this regard. Therefore, an online mechanism is desirable if it incentivizes passengers to truthfully report their actual demand. For the purpose of promoting ridesharing, we hereby introduce a posted-price, integrated online ridesharing mechanism (IORS) that satisfies desirable properties such as ex-post incentive compatibility, individual rationality, and budget-balance. Numerical results indicate the competitiveness of IORS compared with two benchmarks, namely the optimal assignment and an offline, auction-based mechanism.


Belief and Truth in Hypothesised Behaviours

arXiv.org Artificial Intelligence

There is a long history in game theory on the topic of Bayesian or "rational" learning, in which each player maintains beliefs over a set of alternative behaviours, or types, for the other players. This idea has gained increasing interest in the artificial intelligence (AI) community, where it is used as a method to control a single agent in a system composed of multiple agents with unknown behaviours. The idea is to hypothesise a set of types, each specifying a possible behaviour for the other agents, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions of the agents. The game theory literature studies this idea primarily in the context of equilibrium attainment. In contrast, many AI applications have a focus on task completion and payoff maximisation. With this perspective in mind, we identify and address a spectrum of questions pertaining to belief and truth in hypothesised types. We formulate three basic ways to incorporate evidence into posterior beliefs and show when the resulting beliefs are correct, and when they may fail to be correct. Moreover, we demonstrate that prior beliefs can have a significant impact on our ability to maximise payoffs in the long-term, and that they can be computed automatically with consistent performance effects. Furthermore, we analyse the conditions under which we are able complete our task optimally, despite inaccuracies in the hypothesised types. Finally, we show how the correctness of hypothesised types can be ascertained during the interaction via an automated statistical analysis.


Online Learning in Repeated Human-Robot Interactions

AAAI Conferences

Adaptation is a critical component of collaboration. Nevertheless, online learning is not yet used in most successful human-robot interactions, especially when the human's and robot's goals are not fully aligned. There are at least two barriers to the successful application of online learning in HRI. First, typical machine-learning algorithms do not learn at time scales that support effective interactions with people. Algorithms that learn at sufficiently fast time scales often produce myopic strategies that do not lead to good long-term collaborations. Second, random exploration, a core component of most online-learning algorithms, can be problematic for developing collaborative relationships with a human partner. We anticipate that a new genre of online-learning algorithms can overcome these two barriers when paired with (cheap-talk) communication. In this paper, we overview our efforts in these two areas to produce a situation-independent, learning system that quickly learns to collaborate with a human partner.


Learning in Repeated Games with Minimal Information: The Effects of Learning Bias

AAAI Conferences

Automated agents for electricity markets, social networks, and other distributed networks must repeatedly interact with other intelligent agents, often without observing associates' actions or payoffs (i.e., minimal information). Given this reality, our goal is to create algorithms that learn effectively in repeated games played with minimal information. As in other applications of machine learning, the success of a learning algorithm in repeated games depends on its learning bias. To better understand what learning biases are most successful, we analyze the learning biases of previously published multi-agent learning (MAL) algorithms. We then describe a new algorithm that adapts a successful learning bias from the literature to minimal information environments. Finally, we compare the performance of this algorithm with ten other algorithms in repeated games played with minimal information.