Agents
Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery
Niekum, Scott (University of Massachusetts Amherst) | Barto, Andrew G. (University of Massachusetts Amherst)
Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to potential task-specific subgoals. However, such methods do not directly address the question of how many distinct skills are appropriate for solving the tasks that the agent faces. This can be highly inefficient when many identified subgoals correspond to the same underlying skill, but are all used in- dividually as skill goals. Furthermore, skills created in this manner are often only transferable to tasks that share iden- tical state spaces, since corresponding subgoals across tasks are not merged into a single skill goal. We show that these problems can be overcome by clustering subgoal data defined in an agent-space and using the resulting clusters as templates for skill termination conditions. Clustering via a Dirichlet process mixture model is used to discover a minimal, suffi- cient collection of portable skills.
FAQ-Learning in Matrix Games: Demonstrating Convergence Near Nash Equilibria, and Bifurcation of Attractors in the Battle of Sexes
Kaisers, Michael (Maastricht University) | Tuyls, Karl (Maastricht University)
This article studies Frequency Adjusted Q-learning (FAQ-learning), a variation of Q-learning that simulates simultaneous value function updates. The main contributions are empirical and theoretical support for the convergence of FAQ-learning to attractors near Nash equilibria in two-agent two-action matrix games.The games can be divided into three types: Matching pennies, Prisoners' Dilemma and Battle of Sexes. This article shows that the Matching pennies and Prisoners' Dilemma yield one attractor of the learning dynamics, while the Battle of Sexes exhibits a supercritical pitchfork bifurcation at a critical temperature, where one attractor splits into two attractors and one repellent fixed point. Experiments illustrate that the distance between fixed points of the FAQ-learning dynamics and Nash equilibria tends to zero as the exploration parameter of FAQ-learning approaches zero.
Efficiently Eliciting Preferences from a Group of Users
Hines, Greg (University of Waterloo) | Larson, Kate (University of Waterloo)
Learning about users' preferences allows agents to make intelligent decisions on behalf of users. When we are eliciting preferences from a group of users, we can use the preferences of the users we have already processed to increase the efficiency of the elicitation process for the remaining users. However, current methods either require strong prior knowledge about the users' preferences or can be overly cautious and inefficient. Our method, based on standard techniques from non-parametric statistics, allows the controller to choose a balance between prior knowledge and efficiency. This balance is investigated through experimental results.
Reciprocal Preference Model for Two Player Dilemma Games
Ahmed, Asrar (IIIT Hyderabad) | Karlapalem, Kamalakar (IIIT Hyderabad)
Results from behavioral economics show that individuals do not always maximize monetary payoffs. Within behavioral economics different models of social preference have been put forth to account for this deviation from standard assumptions of game theory and economics. Incorporating such models into agent decision making is increasingly relevant to design systems which interact with or on behalf of humans. Existing models, which correctly predict outcomes across a large set of games, are fairly complex. To this end, we present aspiration based social preference model and evaluate it by considering two player dilemma games. We show that the qualitative predictions of our model are consistent with results from behavioral economics.
Leading Multiple Ad Hoc Teammates in Joint Action Settings
Agmon, Noa (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)
The growing use of autonomous agents in practice may require agents to cooperate as a team in situations where they have limited prior knowledge about one another, cannot communicate directly, or do not share the same world models. These situations raise the need to design ad hoc team members, i.e., agents that will be able to cooperate without coordination in order to reach an optimal team behavior. This paper considers problem of leading N-agent teams by a single agent toward their optimal joint utility, where the agents compute their next actions based only on their most recent observations of their teammates' actions. We show that compared to previous results in two-agent teams, in larger teams the agent might not be able to lead the team to the action with maximal joint utility. In these cases, the agent's optimal strategy leads the team to the best possible reachable cycle of joint actions. We describe a graphical model of the problem and a polynomial time algorithm for solving it. We then consider the problem of leading teams where the agents' base their actions on a longer history of past observations, showing that the an upper bound computation time exponential in the memory size is very likely to be tight.
Improvement of Multi-AUV Cooperation through Teammate Verification
Novitzky, Michael (The Georgia Institute of Technology)
Current methods for multi-AUV cooperation suffer in low communication environments. State of the art methods employ auctioneering or planning to determine a single AUV'task. These systems require communication to update models of teammates and tasks for efficient task selection. Most strategies assume a teammate is inoperable if a communication timeout is reached which reduces overall team efficiency. Including teammate prediction has been shown to mitigate efficiency degeneration due to low communication. However, there is no verification of a predicted teammate's task other than through eventual communication. A possible verification tool is behavior recognition. Current behavior recognition utilizes either overhead sensors or post mission analysis to track robot trajectories in order to infer their internal state. A system in which an AUV is capable of sensing a teammate, for example through a forward-looking sonar, and deducing it's behavior along with contextual information, such as location, will enable an AUV to determine that teammate's current task in the overall mission. This will allow for an accurate update of that teammate's model allowing the AUV to more efficiently determine its own next task rather than relying only on communication. This position paper posits that multi-AUV cooperation efficiency will improve in low communication environments with the combination of robust teammate prediction along with verification using behavior recognition.
Dynamic User Task Scheduling for Mobile Robots
Coltin, Brian (Carnegie Mellon University) | Veloso, Manuela (Carnegie Mellon University) | Ventura, Rodrigo (Institute Superior Tecnico)
We present our efforts to deploy mobile robots in office environments, focusing in particular on the challenge of planning a schedule for a robot to accomplish user-requested actions. We concretely aim to make our CoBot mobile robots available to execute navigational tasks requested by users, such as telepresence, and picking up and delivering messages or objects at different locations. We contribute an efficient web-based approach in which users can request and schedule the execution of specific tasks. The scheduling problem is converted to a mixed integer programming problem. The robot executes the scheduled tasks using a synthetic speech and touch-screen interface to interact with users, while allowing users to follow the task execution online. Our robot uses a robust Kinect-based safe navigation algorithm, moves fully autonomously without the need to be chaperoned by anyone, and is robust to the presence of moving humans, as well as non-trivial obstacles, such as legged chairs and tables. Our robots have already performed 15km of autonomous service tasks.
An Intelligent Load Balancing Algorithm Towards Efficient Cloud Computing
Xu, Yang (University of Electronic Science and Technology of China) | Wu, Lei (University of Electronic Science and Technology of China) | Guo, Liying (University of Electronic Science and Technology of China) | Chen, Zheng (University of Electronic Science and Technology of China) | Yang, Lai (Chinese Academy of Sciences) | Shi, Zhongzhi (Chinese Academy of Sciences)
MapReduce provided a novel computing model for complex job decomposition and sub-tasks management to support cloud computing with large distributed data sets. However, its performance is significantly influenced by the working data distributions over those data sets. In this paper, we put forward a novel model to balance data distribution to improve cloud computing performance in data-intensive applications, such as distributed data mining. By extending the classic MapReduce model with an agent-aid layer and abstracting working load requests for data blocks as tokens, the agents can reason from previously received tokens about where to send other tokens in order to balance the working tasks and improve system performance. Our key contribution lies in building an efficient token routing algorithm in spite of agents' unknowing to the global state of data distribution in cloud. We also built a prototype of our system, and the experimental results show that our approach can significantly improve the efficiency of cloud computing.
Mechanism Design for Aggregated Demand Prediction in the Smart Grid
Rose, Harry Thomas (University of Southampton) | Rogers, Alex (University of Southampton) | Gerding, Enrico H (University of Southampton)
This paper presents a novel scoring rule-based mechanism that encourages agents to produce costly estimates of future events and truthfully report them to a centre when the budget for payments to the agents is itself determined by their reports. This is applied to a model of aggregated demand prediction within a microgrid where, given estimates of future consumptions, an aggregator must optimally purchase electricity for a set of homes, each represented by self-interested, rational home agents. This in turn reduces the need for costly standby generation within the grid. The aggregator has prior information about the amount each home will consume, and determines the amount to pay each agent based on savings resulting from using the agents' reported information, over its own prior information. Agents use sensory information regarding their property and its occupants to generate these estimates, which they transmit to the aggregator using smart grid technology. The proposed mechanism is dominant strategy incentive compatible and empirical evaluation shows that it encourages agents to exert effort in producing precise estimates. We show that the mechanism is ex ante individually rational for the aggregator, and that it outperforms a simpler mechanism whereby savings are distributed evenly.
Leadership Games and their Application in Super-Peer Networks
Walsh, Thomas John (University of Arizona) | Taheri, Javad (University of Arizona) | Wright, Jeremy Bryan (University of Arizona) | Cohen, Paul (University of Arizona)
This paper considers a setting where a single ``leadership agent'' intervenes in a multi-agent system through actions that (perhaps subtly) change the dynamics of the system. We describe a number of forms this intervention can take and compare these situations to settings in previous work. We identify two important effects of leadership: faster system convergence, and convergence to a better equilibrium. Empirically, we first explore these properties in leadership of algorithms engaged in classical 2-player games. We then apply this general framework to the leadership of a super-peer file-sharing network. In these experiments the network contains some agents that make locally greedy decisions that hamper the network as a whole. We show that a leader acting based on a more global criteria can push the system to a better equilibrium point as well as speeding up convergence. We also show how a mathematical approximation of such super-peer networks can be used to aid a leader in determining a minimum-cost intervention strategy.