Agents
Online Asynchronous Distributed Regression
Distributed computing offers a high degree of flexibility to accommodate modern learning constraints and the ever increasing size of datasets involved in massive data issues. Drawing inspiration from the theory of distributed computation models developed in the context of gradient-type optimization algorithms, we present a consensus-based asynchronous distributed approach for nonparametric online regression and analyze some of its asymptotic properties. Substantial numerical evidence involving up to 28 parallel processors is provided on synthetic datasets to assess the excellent performance of our method, both in terms of computation time and prediction accuracy.
Allocation in Practice
How do we allocate scarce resources? How do we fairly allocate costs? These are two pressing challenges facing society today. I discuss two recent projects at NICTA concerning resource and cost allocation. In the first, we have been working with FoodBank Local, a social startup working in collaboration with food bank charities around the world to optimise the logistics of collecting and distributing donated food. Before we can distribute this food, we must decide how to allocate it to different charities and food kitchens. This gives rise to a fair division problem with several new dimensions, rarely considered in the literature. In the second, we have been looking at cost allocation within the distribution network of a large multinational company. This also has several new dimensions rarely considered in the literature.
Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
Nguyen, Duc Thien (Singapore Management University) | Yeoh, William (New Mexico State University) | Lau, Hoong Chuin (Singapore Management University) | Zilberstein, Shlomo (University of Massachusetts, Amherst) | Zhang, Chongjie (Massachusetts Institute of Technology)
Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multi-arm bandit DCOP algorithm on dynamic DCOPs.
To Share or Not to Share? The Single Agent in a Team Decision Problem
Amir, Ofra (Harvard University) | Grosz, Barbara J. (Harvard University) | Stern, Roni (Ben-Gurion University of the Negev)
This paper defines the "Single Agent in a Team Decision" (SATD) problem. SATD differs from prior multi-agent communication problems in the assumptions it makes about teammates' knowledge of each other's plans and possible observations. The paper proposes a novel integrated logical-decision-theoretic approach to solving SATD problems, called MDP-PRT. Evaluation of MDP-PRT shows that it outperforms a previously proposed communication mechanism that did not consider the timing of communication and compares favorably with a coordinated Dec-POMDP solution that uses knowledge about all possible observations.
Accurate Household Occupant Behavior Modeling Based on Data Mining Techniques
Baptista, Márcia L. (Universidade de Lisboa) | Fang, Anjie (National Institute of Informatics / University of Bristol) | Prendinger, Helmut (National Institute of Informatics) | Prada, Rui (Universidade de Lisboa) | Yamaguchi, Yohei (Osaka University)
An important requirement of household energy simulation models is their accuracy in estimating energy demand and its fluctuations. Occupant behavior has a major impact upon energy demand. However, Markov chains, the traditional approach to model occupant behavior, (1) has limitations in accurately capturing the coordinated behavior of occupants and (2) is prone to over-fitting. To address these issues, we propose a novel approach that relies on a combination of data mining techniques. The core idea of our model is to determine the behavior of occupants based on nearest neighbor comparison over a database of sample data. Importantly, the model takes into account features related to the coordination of occupants' activities. We use a customized distance function suited for mixed categorical and numerical data. Further, association rule learning allows us to capture the coordination between occupants. Using real data from four households in Japan we are able to show that our model outperforms the traditional Markov chain model with respect to occupant coordination and generalization of behavior patterns.
Item Bidding for Combinatorial Public Projects
Markakis, Evangelos (Athens University of Economics and Business) | Telelis, Orestis (Athens University of Economics and Business)
We present and analyze a mechanism for the Combinatorial Public Project Problem (CPPP). The problem asks to select k out of m available items, so as to maximize the social welfare for autonomous agents with combinatorial preferences (valuation functions) over subsets of items. The CPPP constitutes an abstract model for decision making by autonomous agents and has been shown to present severe computational hardness, in the design of truthful approximation mechanisms. We study a non-truthful mechanism that is, however, practically relevant to multi-agent environments, by virtue of its natural simplicity. It employs an Item Bidding interface, wherein every agent issues a separate bid for the inclusion of each distinct item in the outcome; the k items with the highest sums of bids are chosen and agents are charged according to a VCG-based payment rule. For fairly expressive classes of the agents' valuation functions, we establish existence of socially optimal pure Nash and strong equilibria, that are resilient to coordinated deviations of subsets of agents. Subsequently we derive tight worst-case bounds on the approximation of the optimum social welfare achieved in equilibrium. We show that the mechanism's performance improves with the number of agents that can coordinate, and reaches half of the optimum welfare at strong equilibrium.
The Fisher Market Game: Equilibrium and Welfare
Brânzei, Simina (Aarhus University) | Chen, Yiling (Harvard University) | Deng, Xiaotie (Shanghai Jiao Tong University) | Filos-Ratsikas, Aris (Aarhus University) | Frederiksen, Søren Kristoffer Stiil (Aarhus University) | Zhang, Jie (University of Oxford)
The Fisher market model is one of the most fundamental resource allocation models in economics. In a Fisher market, the prices and allocations of goods are determined according to the preferences and budgets of buyers to clear the market. In a Fisher market game, however, buyers are strategic and report their preferences over goods; the market-clearing prices and allocations are then determined based on their reported preferences rather than their real preferences. We show that the Fisher market game always has a pure Nash equilibrium, for buyers with linear, Leontief, and Cobb-Douglas utility functions, which are three representative classes of utility functions in the important Constant Elasticity of Substitution (CES) family. Furthermore, to quantify the social efficiency, we prove Price of Anarchy bounds for the game when the utility functions of buyers fall into these three classes respectively.
On the Incompatibility of Efficiency and Strategyproofness in Randomized Social Choice
Aziz, Haris (NICTA and University of New South Wales) | Brandl, Florian (TU München) | Brandt, Felix (TU München)
Efficiency--no agent can be made better off without making another one worse off--and strategyproofness--no agent can obtain a more preferred outcome by misrepresenting his preferences--are two cornerstones of economics and ubiquitous in important areas such as voting, auctions, or matching markets. Within the context of random assignment, Bogomolnaia and Moulin have shown that two particular notions of efficiency and strategyproofness based on stochastic dominance are incompatible. However, there are various other possibilities of lifting preferences over alternatives to preferences over lotteries apart from stochastic dominance. In this paper, we give an overview of common preference extensions, propose two new ones, and show that the above-mentioned incompatibility can be extended to various other notions of strategyproofness and efficiency in randomized social choice.
TacTex'13: A Champion Adaptive Power Trading Agent
Urieli, Daniel (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)
Sustainable energy systems of the future will no longer be able to rely on the current paradigm that energy supply follows demand. Many of the renewable energy resources do not produce power on demand, and therefore there is a need for new market structures that motivate sustainable behaviors by participants. The Power Trading Agent Competition (Power TAC) is a new annual competition that focuses on the design and operation of future retail power markets, specifically in smart grid environments with renewable energy production, smart metering, and autonomous agents acting on behalf of customers and retailers. It uses a rich, open-source simulation platform that is based on real-world data and state-of-the-art customer models. Its purpose is to help researchers understand the dynamics of customer and retailer decision-making, as well as the robustness of proposed market designs. This paper introduces TacTex'13, the champion agent from the inaugural competition in 2013. TacTex'13 learns and adapts to the environment in which it operates, by heavily relying on reinforcement learning and prediction methods. This paper describes the constituent components of TacTex'13 and examines its success through analysis of competition results and subsequent controlled experiments.
Coordination of Multiple Teams of Robots for an Optimal Global Plan
Saribatur, Zeynep Gozen (Sabanci University) | Erdem, Esra (Sabanci University) | Patoglu, Volkan (Sabanci University)
Also, we do many application domains, ranging from search and rescue not assume that all teams are in the same workspace, or all operations to exploration missions, service robotics to cognitive robots are of the same sort. Moreover, our goal is not to find factories. In these domains, the goal is for all teams any coordination of teams that would allow decoupling of to complete their tasks as soon as possible, and should the their local plans, but to find a coordination of teams for an need arise, teams help each other by lending robots.