Agents
L2E: Learning to Exploit Your Opponent
Wu, Zhe, Li, Kai, Zhao, Enmin, Xu, Hang, Zhang, Meng, Fu, Haobo, An, Bo, Xing, Junliang
Opponent modeling is essential to exploit sub-optimal opponents in strategic interactions. Most previous works focus on building explicit models to directly predict the opponents' styles or strategies, which require a large amount of data to train the model and lack adaptability to unknown opponents. In this work, we propose a novel Learning to Exploit (L2E) framework for implicit opponent modeling. L2E acquires the ability to exploit opponents by a few interactions with different opponents during training, thus can adapt to new opponents with unknown styles during testing quickly. We propose a novel opponent strategy generation algorithm that produces effective opponents for training automatically. We evaluate L2E on two poker games and one grid soccer game, which are the commonly used benchmarks for opponent modeling. Comprehensive experimental results indicate that L2E quickly adapts to diverse styles of unknown opponents.
Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State
Dong, Shi, Van Roy, Benjamin, Zhou, Zhengyuan
We design a simple reinforcement learning agent that, with a specification only of agent state dynamics and a reward function, can operate with some degree of competence in any environment. The agent maintains only visitation counts and value estimates for each agent-state-action pair. The value function is updated incrementally in response to temporal differences and optimistic boosts that encourage exploration. The agent executes actions that are greedy with respect to this value function. We establish a regret bound demonstrating convergence to near-optimal per-period performance, where the time taken to achieve near-optimality is polynomial in the number of agent states and actions, as well as the reward mixing time of the best policy within the reference policy class, which is comprised of those that depend on history only through agent state. Notably, there is no further dependence on the number of environment states or mixing times associated with other policies or statistics of history. Our result sheds light on the potential benefits of (deep) representation learning, which has demonstrated the capability to extract compact and relevant features from high-dimensional interaction histories.
Efficient Large-Scale Multi-Drone Delivery using Transit Networks
Choudhury, Shushman (Stanford University) | Solovey, Kiril | Kochenderfer, Mykel J. | Pavone, Marco
We consider the problem of routing a large fleet of drones to deliver packages simultaneously across broad urban areas. Besides flying directly, drones can use public transit vehicles such as buses and trams as temporary modes of transportation to conserve energy. Adding this capability to our formulation augments effective drone travel range and the space of possible deliveries but also increases problem input size due to the large transit networks. We present a comprehensive algorithmic framework that strives to minimize the maximum time to complete any delivery and addresses the multifaceted computational challenges of our problem through a two-layer approach. First, the upper layer assigns drones to package delivery sequences with an approximately optimal polynomial time allocation algorithm. Then, the lower layer executes the allocation by periodically routing the fleet over the transit network, using efficient, bounded suboptimal multi-agent pathfinding techniques tailored to our setting. We demonstrate the efficiency of our approach on simulations with up to 200 drones, 5000 packages, and transit networks with up to 8000 stops in San Francisco and the Washington DC Metropolitan Area. Our framework computes solutions for most settings within a few seconds on commodity hardware and enables drones to extend their effective range by a factor of nearly four using transit.
Two-facility Location Games with Minimum Distance Requirement
Xu, Xinping | Li, Bo (Department of Computer Science, University of Oxford) | Li, Minming (Department of Computer Science, City University of Hong Kong) | Duan, Lingjie (Singapore University of Technology and Design)
We study the mechanism design problem of a social planner for locating two facilities on a line interval [0, 1], where a set of n strategic agents report their locations and a mechanism determines the locations of the two facilities. We consider the requirement of a minimum distance 0 โค d โค 1 between the two facilities. Given the two facilities are heterogeneous, we model the cost/utility of an agent as the sum of his distances to both facilities. In the heterogeneous two-facility location game to minimize the social cost, we show that the optimal solution can be computed in polynomial time and prove that carefully choosing one optimal solution as output is strategyproof. We also design a strategyproof mechanism minimizing the maximum cost. Given the two facilities are homogeneous, we model the cost/utility of an agent as his distance to the closer facility. In the homogeneous two-facility location game for minimizing the social cost, we show that any deterministic strategyproof mechanism has unbounded approximation ratio. Moreover, in the obnoxious heterogeneous two-facility location game for maximizing the social utility, we propose new deterministic group strategyproof mechanisms with provable approximation ratios and establish a lower bound (7 โ d)/6 for any deterministic strategyproof mechanism. We also design a strategyproof mechanism maximizing the minimum utility. In the obnoxious homogeneous two-facility location game for maximizing the social utility, we propose deterministic group strategyproof mechanisms with provable approximation ratios and establish a lower bound 4/3. Besides, in the two-facility location game with triple-preference, where each facility may be favorable, obnoxious, indifferent for any agent, we further motivate agents to report both their locations and preferences towards the two facilities truthfully, and design a deterministic group strategyproof mechanism with an approximation ratio 4.
Understanding algorithmic collusion with experience replay
With the digitalization of the economy and the advances in data analytics, firms are increasingly handing key manual decisions such as product pricing over to computers (Fisher et al., 2018; Miklรณs-Thal and Tucker, 2019; Hansen et al., 2020). However, the sophistication and powerfulness of algorithms have also led to another prominent concern on the possibility of collusion. Pricing algorithms may be too advanced to learn that it is optimal to collude (Ezrachi and Stucke, 2016). Although many are skeptical that autonomous collusion is only science fiction, recent experimental research (Waltman and Kaymak, 2008; Klein, 2019; Calvano et al., 2020; Hansen et al., 2020) suggests that dynamic pricing algorithms can learn collusive strategies from scratch, even without human guidance or communication with each other.
Spatio-Temporal Graph Dual-Attention Network for Multi-Agent Prediction and Tracking
Li, Jiachen, Ma, Hengbo, Zhang, Zhihao, Li, Jinning, Tomizuka, Masayoshi
An effective understanding of the environment and accurate trajectory prediction of surrounding dynamic obstacles are indispensable for intelligent mobile systems (e.g. autonomous vehicles and social robots) to achieve safe and high-quality planning when they navigate in highly interactive and crowded scenarios. Due to the existence of frequent interactions and uncertainty in the scene evolution, it is desired for the prediction system to enable relational reasoning on different entities and provide a distribution of future trajectories for each agent. In this paper, we propose a generic generative neural system (called STG-DAT) for multi-agent trajectory prediction involving heterogeneous agents. The system takes a step forward to explicit interaction modeling by incorporating relational inductive biases with a dynamic graph representation and leverages both trajectory and scene context information. We also employ an efficient kinematic constraint layer applied to vehicle trajectory prediction. The constraint not only ensures physical feasibility but also enhances model performance. Moreover, the proposed prediction model can be easily adopted by multi-target tracking frameworks. The tracking accuracy proves to be improved by empirical results. The proposed system is evaluated on three public benchmark datasets for trajectory prediction, where the agents cover pedestrians, cyclists and on-road vehicles. The experimental results demonstrate that our model achieves better performance than various baseline approaches in terms of prediction and tracking accuracy.
Symmetry Breaking for k-Robust Multi-Agent Path Finding
Chen, Zhe, Harabor, Daniel, Li, Jiaoyang, Stuckey, Peter J.
During Multi-Agent Path Finding (MAPF) problems, agents can be delayed by unexpected events. To address such situations recent work describes k-Robust Conflict-BasedSearch (k-CBS): an algorithm that produces coordinated and collision-free plan that is robust for up to k delays. In this work we introducing a variety of pairwise symmetry breaking constraints, specific to k-robust planning, that can efficiently find compatible and optimal paths for pairs of conflicting agents. We give a thorough description of the new constraints and report large improvements to success rate ina range of domains including: (i) classic MAPF benchmarks;(ii) automated warehouse domains and; (iii) on maps from the 2019 Flatland Challenge, a recently introduced railway domain where k-robust planning can be fruitfully applied to schedule trains.
Dynamic neighbourhood optimisation for task allocation using multi-agent
Creech, Niall, Pacheco, Natalia Criado, Miles, Simon
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.
Moral Decision-Making in Medical Hybrid Intelligent Systems: A Team Design Patterns Approach to the Bias Mitigation and Data Sharing Design Problems
Increasing automation in the healthcare sector calls for a Hybrid Intelligence (HI) approach to closely study and design the collaboration of humans and autonomous machines. Ensuring that medical HI systems' decision-making is ethical is key. The use of Team Design Patterns (TDPs) can advance this goal by describing successful and reusable configurations of design problems in which decisions have a moral component, as well as through facilitating communication in multidisciplinary teams designing HI systems. For this research, TDPs were developed to describe a set of solutions for two design problems in a medical HI system: (1) mitigating harmful biases in machine learning algorithms and (2) sharing health and behavioral patient data with healthcare professionals and system developers. The Socio-Cognitive Engineering methodology was employed, integrating operational demands, human factors knowledge, and a technological analysis into a set of TDPs. A survey was created to assess the usability of the patterns on their understandability, effectiveness, and generalizability. The results showed that TDPs are a useful method to unambiguously describe solutions for diverse HI design problems with a moral component on varying abstraction levels, that are usable by a heterogeneous group of multidisciplinary researchers. Additionally, results indicated that the SCE approach and the developed questionnaire are suitable methods for creating and assessing TDPs. The study concludes with a set of proposed improvements to TDPs, including their integration with Interaction Design Patterns, the inclusion of several additional concepts, and a number of methodological improvements. Finally, the thesis recommends directions for future research.
A Qualitative Theory of Cognitive Attitudes and their Change
Since the seminal work of Hintikka on epistemic logic [28], of Von Wright on the logic of preference [55, 56] and of Cohen & Levesque on the logic of intention [19], many formal logics for reasoning about cognitive attitudes of agents such as knowledge and belief [24], preference [32, 48], desire [23], intention [44, 30] and their combination [38, 54] have been proposed. Generally speaking, these logics are nothing but formal models of rational agency relying on the idea that an agent endowed with cognitive attitudes makes decisions on the basis of what she believes and of what she desires or prefers. The idea of describing rational agents in terms of their epistemic and motivational attitudes is something that these logics share with classical decision theory and game theory. Classical decision theory and game theory provide a quantitative account of individual and strategic decision-making by assuming that agents' beliefs and desires can be respectively modeled by subjective probabilities and utilities. Qualitative approaches to individual and strategic decision-making have been proposed in AI [16, 22] to characterize criteria that a rational agent should adopt for making decisions when she cannot build a probability distribution over the set of possible events and her preference over the set of possible outcomes cannot be expressed by a utility function but only by a qualitative ordering over the outcomes.