Agents
Information-Theoretic Bounded Rationality
Ortega, Pedro A., Braun, Daniel A., Dyer, Justin, Kim, Kee-Eung, Tishby, Naftali
Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the free energy functional as the objective function for characterizing bounded-rational decisions. This functional possesses three crucial properties: it controls the size of the solution space; it has Monte Carlo planners that are exact, yet bypass the need for exhaustive search; and it captures model uncertainty arising from lack of evidence or from interacting with other agents having unknown intentions. We discuss the single-step decision-making case, and show how to extend it to sequential decisions using equivalence transformations. This extension yields a very general class of decision problems that encompass classical decision rules (e.g.
On Voting and Facility Location
Feldman, Michal, Fiat, Amos, Golomb, Iddan
We study mechanisms for candidate selection that seek to minimize the social cost, where voters and candidates are associated with points in some underlying metric space. The social cost of a candidate is the sum of its distances to each voter. Some of our work assumes that these points can be modeled on a real line, but other results of ours are more general. A question closely related to candidate selection is that of minimizing the sum of distances for facility location. The difference is that in our setting there is a fixed set of candidates, whereas the large body of work on facility location seems to consider every point in the metric space to be a possible candidate. This gives rise to three types of mechanisms which differ in the granularity of their input space (voting, ranking and location mechanisms). We study the relationships between these three classes of mechanisms. While it may seem that Black's 1948 median algorithm is optimal for candidate selection on the line, this is not the case. We give matching upper and lower bounds for a variety of settings. In particular, when candidates and voters are on the line, our universally truthful spike mechanism gives a [tight] approximation of two. When assessing candidate selection mechanisms, we seek several desirable properties: (a) efficiency (minimizing the social cost) (b) truthfulness (dominant strategy incentive compatibility) and (c) simplicity (a smaller input space). We quantify the effect that truthfulness and simplicity impose on the efficiency.
Possible and Necessary Winners of Partial Tournaments
Aziz, Haris, Brill, Markus, Fischer, Felix, Harrenstein, Paul, Lang, Jerome, Seedig, Hans Georg
We study the problem of computing possible and necessary winners for partially specified weighted and unweighted tournaments. This problem arises naturally in elections with incompletely specified votes, partially completed sports competitions, and more generally in any scenario where the outcome of some pairwise comparisons is not yet fully known. We specifically consider a number of well-known solution concepts---including the uncovered set, Borda, ranked pairs, and maximin---and show that for most of them, possible and necessary winners can be identified in polynomial time. These positive algorithmic results stand in sharp contrast to earlier results concerning possible and necessary winners given partially specified preference profiles.
Bayesian Policy Reuse
Rosman, Benjamin, Hawasly, Majd, Ramamoorthy, Subramanian
A long-lived autonomous agent should be able to respond online to novel instances of tasks from a familiar domain. Acting online requires 'fast' responses, in terms of rapid convergence, especially when the task instance has a short duration, such as in applications involving interactions with humans. These requirements can be problematic for many established methods for learning to act. In domains where the agent knows that the task instance is drawn from a family of related tasks, albeit without access to the label of any given instance, it can choose to act through a process of policy reuse from a library, rather than policy learning from scratch. In policy reuse, the agent has prior knowledge of the class of tasks in the form of a library of policies that were learnt from sample task instances during an offline training phase. We formalise the problem of policy reuse, and present an algorithm for efficiently responding to a novel task instance by reusing a policy from the library of existing policies, where the choice is based on observed 'signals' which correlate to policy performance. We achieve this by posing the problem as a Bayesian choice problem with a corresponding notion of an optimal response, but the computation of that response is in many cases intractable. Therefore, to reduce the computation cost of the posterior, we follow a Bayesian optimisation approach and define a set of policy selection functions, which balance exploration in the policy library against exploitation of previously tried policies, together with a model of expected performance of the policy library on their corresponding task instances. We validate our method in several simulated domains of interactive, short-duration episodic tasks, showing rapid convergence in unknown task variations.
The Rationale behind the Concept of Goal
Governatori, Guido, Olivieri, Francesco, Scannapieco, Simone, Rotolo, Antonino, Cristani, Matteo
The paper proposes a fresh look at the concept of goal and advances that motivational attitudes like desire, goal and intention are just facets of the broader notion of (acceptable) outcome. We propose to encode the preferences of an agent as sequences of "alternative acceptable outcomes". We then study how the agent's beliefs and norms can be used to filter the mental attitudes out of the sequences of alternative acceptable outcomes. Finally, we formalise such intuitions in a novel Modal Defeasible Logic and we prove that the resulting formalisation is computationally feasible.
Welfare of Sequential Allocation Mechanisms for Indivisible Goods
Aziz, Haris, Kalinowski, Thomas, Walsh, Toby, Xia, Lirong
Sequential allocation is a simple and attractive mechanism for the allocation of indivisible goods. Agents take turns, according to a policy, to pick items. Sequential allocation is guaranteed to return an allocation which is efficient but may not have an optimal social welfare. We consider therefore the relation between welfare and efficiency. We study the (computational) questions of what welfare is possible or necessary depending on the choice of policy. We also consider a novel control problem in which the chair chooses a policy to improve social welfare.
Constraining Information Sharing to Improve Cooperative Information Gathering
This paper considers the problem of cooperation between self-interested agents in acquiring better information regarding the nature of the different options and opportunities available to them. By sharing individual findings with others, the agents can potentially achieve a substantial improvement in overall and individual expected benefits. Unfortunately, it is well known that with self-interested agents equilibrium considerations often dictate solutions that are far from the fully cooperative ones, hence the agents do not manage to fully exploit the potential benefits encapsulated in such cooperation. In this paper we introduce, analyze and demonstrate the benefit of five methods aiming to improve cooperative information gathering. Common to all five that they constrain and limit the information sharing process. Nevertheless, the decrease in benefit due to the limited sharing is outweighed by the resulting substantial improvement in the equilibrium individual information gathering strategies. The equilibrium analysis given in the paper, which, in itself is an important contribution to the study of cooperation between self-interested agents, enables demonstrating that for a wide range of settings an improved individual expected benefit is achieved for all agents when applying each of the five methods.
Stick-Breaking Policy Learning in Dec-POMDPs
Liu, Miao, Amato, Christopher, Liao, Xuejun, Carin, Lawrence, How, Jonathan P.
Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emph{decentralized stick-breaking policy representation} (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.
Learning Adversary Behavior in Security Games: A PAC Model Perspective
Sinha, Arunesh, Kar, Debarun, Tambe, Milind
Recent applications of Stackelberg Security Games (SSG), from wildlife crime to urban crime, have employed machine learning tools to learn and predict adversary behavior using available data about defender-adversary interactions. Given these recent developments, this paper commits to an approach of directly learning the response function of the adversary. Using the PAC model, this paper lays a firm theoretical foundation for learning in SSGs (e.g., theoretically answer questions about the numbers of samples required to learn adversary behavior) and provides utility guarantees when the learned adversary model is used to plan the defender's strategy. The paper also aims to answer practical questions such as how much more data is needed to improve an adversary model's accuracy. Additionally, we explain a recently observed phenomenon that prediction accuracy of learned adversary behavior is not enough to discover the utility maximizing defender strategy. We provide four main contributions: (1) a PAC model of learning adversary response functions in SSGs; (2) PAC-model analysis of the learning of key, existing bounded rationality models in SSGs; (3) an entirely new approach to adversary modeling based on a non-parametric class of response functions with PAC-model analysis and (4) identification of conditions under which computing the best defender strategy against the learned adversary behavior is indeed the optimal strategy. Finally, we conduct experiments with real-world data from a national park in Uganda, showing the benefit of our new adversary modeling approach and verification of our PAC model predictions.
Crowd Behavior Analysis: A Review where Physics meets Biology
Kok, Ven Jyn, Lim, Mei Kuan, Chan, Chee Seng
Although the traits emerged in a mass gathering are often non-deliberative, the act of mass impulse may lead to irre- vocable crowd disasters. The two-fold increase of carnage in crowd since the past two decades has spurred significant advances in the field of computer vision, towards effective and proactive crowd surveillance. Computer vision stud- ies related to crowd are observed to resonate with the understanding of the emergent behavior in physics (complex systems) and biology (animal swarm). These studies, which are inspired by biology and physics, share surprisingly common insights, and interesting contradictions. However, this aspect of discussion has not been fully explored. Therefore, this survey provides the readers with a review of the state-of-the-art methods in crowd behavior analysis from the physics and biologically inspired perspectives. We provide insights and comprehensive discussions for a broader understanding of the underlying prospect of blending physics and biology studies in computer vision.