Goto

Collaborating Authors

 Agents


Preallocation and Planning Under Stochastic Resource Constraints

AAAI Conferences

Resource constraints frequently complicate multi-agent planning problems. Existing algorithms for resource-constrained, multi-agent planning problems rely on the assumption that the constraints are deterministic. However, frequently resource constraints are themselves subject to uncertainty from external influences. Uncertainty about constraints is especially challenging when agents must execute in an environment where communication is unreliable, making on-line coordination difficult. In those cases, it is a significant challenge to find coordinated allocations at plan time depending on availability at run time. To address these limitations, we propose to extend algorithms for constrained multi-agent planning problems to handle stochastic resource constraints. We show how to factorize resource limit uncertainty and use this to develop novel algorithms to plan policies for stochastic constraints. We evaluate the algorithms on a search-and-rescue problem and on a power-constrained planning domain where the resource constraints are decided by nature. We show that plans taking into account all potential realizations of the constraint obtain significantly better utility than planning for the expectation, while causing fewer constraint violations.


An Ant-Based Algorithm to Solve Distributed Constraint Optimization Problems

AAAI Conferences

As an important population-based algorithm, ant colony optimization (ACO) has been successfully applied into various combinatorial optimization problems. However, much existing work in ACO focuses on solving centralized problems. In this paper, we present a novel algorithm that takes the power of ants to solve Distributed Constraint Optimization Problems (DCOPs), called ACO_DCOP. In ACO_DCOP, a new mechanism that captures local benefits is proposed to compute heuristic factors and a new method that considers the cost structure of DCOPs is proposed to compute pheromone deltas appropriately. Moreover, pipelining technique is introduced to make full use of the computational capacity and improve the efficiency. In our theoretical analysis, we prove that ACO_DCOP is an anytime algorithm. Our empirical evaluation indicates that ACO_DCOP is able to find solutions of equal or significantly higher quality than state-of-the-art DCOP algorithms.


Gaussian Process Decentralized Data Fusion Meets Transfer Learning in Large-Scale Distributed Cooperative Perception

AAAI Conferences

This paper presents novel Gaussian process decentralized data fusion algorithms exploiting the notion of agent-centric support sets for distributed cooperative perception of large-scale environmental phenomena. To overcome the limitations of scale in existing works, our proposed algorithms allow every mobile sensing agent to choose a different support set and dynamically switch to another during execution for encapsulating its own data into a local summary that, perhaps surprisingly, can still be assimilated with the other agents' local summaries (i.e., based on their current choices of support sets) into a globally consistent summary to be used for predicting the phenomenon. To achieve this, we propose a novel transfer learning mechanismfor a team of agents capable of sharing and transferring information encapsulated in a summary based on a support set to that utilizing a different support set with some loss that can be theoretically bounded and analyzed. To alleviate the issue of information loss accumulating over multiple instances of transfer learning, we propose a new information sharing mechanism to be incorporated into our algorithms in order to achieve memory-efficient lazy transfer learning. Empirical evaluation on real-world datasets show that our algorithms outperform the state-of-the-art methods.


On Value Function Representation of Long Horizon Problems

AAAI Conferences

In Reinforcement Learning, an intelligent agent has to make a sequence of decisions to accomplish a goal. If this sequence is long, then the agent has to plan over a long horizon. While learning the optimal policy and its value function is a well studied problem in Reinforcement Learning, this paper focuses on the structure of the optimal value function and how hard it is to represent the optimal value function. We show that the generalized Rademacher complexity of the hypothesis space of all optimal value functions is dependent on the planning horizon and independent of the state and action space size. Further, we present bounds on the action-gaps of action value functions and show that they can collapse if a long planning horizon is used. The theoretical results are verified empirically on randomly generated MDPs and on a grid-world fruit collection task using deep value function approximation. Our theoretical results highlight a connection between value function approximation and the Options framework and suggest that value functions should be decomposed along bottlenecks of the MDP's transition dynamics.


Counterfactual Multi-Agent Policy Gradients

AAAI Conferences

Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.


In Praise of Belief Bases: Doing Epistemic Logic Without Possible Worlds

AAAI Conferences

We introduce a new semantics for a logic of explicit and implicit beliefs based on the concept of multi-agent belief base. Differently from existing Kripke-style semantics for epistemic logic in which the notions of possible world and doxastic/epistemic alternative are primitive, in our semantics they are non-primitive but are defined from the concept of belief base. We provide a complete axiomatization and a decidability result for our logic.


Towards Formal Definitions of Blameworthiness, Intention, and Moral Responsibility

AAAI Conferences

We provide formal definitions of degree of blameworthiness and intention relative to an epistemic state (a probability over causal models and a utility function on outcomes). These, together with a definition of actual causality, provide the key ingredients for moral responsibility judgments. We show that these definitions give insight into commonsense intuitions in a variety of puzzling cases from the literature.


Towards a Unified Framework for Syntactic Inconsistency Measures

AAAI Conferences

A number of proposals have been made to define inconsistency measures. Each has its rationale. But to date, it is not clear how to delineate the space of options for measures, nor is it clear how we can classify measures systematically. In this paper, we introduce a general framework for comparing syntactic inconsistency measures. It uses the construction of an inconsistency graph for each knowledgebase. We then introduce abstractions of the inconsistency graph and use the hierarchy of the abstractions to classify a range of inconsistency measures.


Adapting a Kidney Exchange Algorithm to Align With Human Values

AAAI Conferences

The efficient allocation of limited resources is a classical problem in economics and computer science. In kidney exchanges, a central market maker allocates living kidney donors to patients in need of an organ. Patients and donors in kidney exchanges are prioritized using ad-hoc weights decided on by committee and then fed into an allocation algorithm that determines who get what—and who does not. In this paper, we provide an end-to-end methodology for estimating weights of individual participant profiles in a kidney exchange. We first elicit from human subjects a list of patient attributes they consider acceptable for the purpose of prioritizing patients (e.g., medical characteristics, lifestyle choices, and so on). Then, we ask subjects comparison queries between patient profiles and estimate weights in a principled way from their responses. We show how to use these weights in kidney exchange market clearing algorithms. We then evaluate the impact of the weights in simulations and find that the precise numerical values of the weights we computed matter little, other than the ordering of profiles that they imply. However, compared to not prioritizing patients at all, there is a significant effect, with certain classes of patients being (de)prioritized based on the human-elicited value judgments.


Partial Truthfulness in Minimal Peer Prediction Mechanisms With Limited Knowledge

AAAI Conferences

We study minimal single-task peer prediction mechanisms that have limited knowledge about agents' beliefs. Without knowing what agents' beliefs are or eliciting additional information, it is not possible to design a truthful mechanism in a Bayesian-Nash sense. We go beyond truthfulness and explore equilibrium strategy profiles that are only partially truthful. Using the results from the multi-armed bandit literature, we give a characterization of how inefficient these equilibria are comparing to truthful reporting. We measure the inefficiency of such strategies by counting the number of dishonest reports that any minimal knowledge-bounded mechanism must have. We show that the order of this number is θ(log n), where n is the number of agents, and we provide a peer prediction mechanism that achieves this bound in expectation.