Agents
Randomized Wagering Mechanisms
Chen, Yiling, Liu, Yang, Wang, Juntao
Wagering mechanisms are one-shot betting mechanisms that elicit agents' predictions of an event. For deterministic wagering mechanisms, an existing impossibility result has shown incompatibility of some desirable theoretical properties. In particular, Pareto optimality (no profitable side bet before allocation) can not be achieved together with weak incentive compatibility, weak budget balance and individual rationality. In this paper, we expand the design space of wagering mechanisms to allow randomization and ask whether there are randomized wagering mechanisms that can achieve all previously considered desirable properties, including Pareto optimality. We answer this question positively with two classes of randomized wagering mechanisms: i) one simple randomized lottery-type implementation of existing deterministic wagering mechanisms, and ii) another family of simple and randomized wagering mechanisms which we call surrogate wagering mechanisms, which are robust to noisy ground truth. This family of mechanisms builds on the idea of learning with noisy labels (Natarajan et al. 2013) as well as a recent extension of this idea to the information elicitation without verification setting (Liu and Chen 2018). We show that a broad family of randomized wagering mechanisms satisfy all desirable theoretical properties.
A Virtual Testbed for Critical Incident Investigation with Autonomous Remote Aerial Vehicle Surveying, Artificial Intelligence, and Decision Support
Ullah, Ihsan, Abinesh, Sai, Smyth, David L., Karimi, Nazli B., Drury, Brett, Glavin, Frank G., Madden, Michael G.
Autonomous robotics and artificial intelligence techniques can be used to support human personnel in the event of critical incidents. These incidents can pose great danger to human life. Some examples of such assistance include: multi-robot surveying of the scene; collection of sensor data and scene imagery, real-time risk assessment and analysis; object identification and anomaly detection; and retrieval of relevant supporting documentation such as standard operating procedures (SOPs). These incidents, although often rare, can involve chemical, biological, radiological/nuclear or explosive (CBRNE) substances and can be of high consequence. Real-world training and deployment of these systems can be costly and sometimes not feasible. For this reason, we have developed a realistic 3D model of a CBRNE scenario to act as a testbed for an initial set of assisting AI tools that we have developed.
Blameworthiness in Strategic Games
There are multiple notions of coalitional responsibility. The focus of this paper is on the blameworthiness defined through the principle of alternative possibilities: a coalition is blamable for a statement if the statement is true, but the coalition had a strategy to prevent it. The main technical result is a sound and complete bimodal logical system that describes properties of blameworthiness in one-shot games.
CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning
Yang, Jiachen, Nakhaei, Alireza, Isele, David, Zha, Hongyuan, Fujimura, Kikuo
We propose CM3, a new deep reinforcement learning method for cooperative multi-agent problems where agents must coordinate for joint success in achieving different individual goals. We restructure multi-agent learning into a two-stage curriculum, consisting of a single-agent stage for learning to accomplish individual tasks, followed by a multi-agent stage for learning to cooperate in the presence of other agents. These two stages are bridged by modular augmentation of neural network policy and value functions. We further adapt the actor-critic framework to this curriculum by formulating local and global views of the policy gradient and learning via a double critic, consisting of a decentralized value function and a centralized action-value function. We evaluated CM3 on a new high-dimensional multi-agent environment with sparse rewards: negotiating lane changes among multiple autonomous vehicles in the Simulation of Urban Mobility (SUMO) traffic simulator. Detailed ablation experiments show the positive contribution of each component in CM3, and the overall synthesis converges significantly faster to higher performance policies than existing cooperative multi-agent methods.
Coordination-driven learning in multi-agent problem spaces
Barton, Sean L., Waytowich, Nicholas R., Asher, Derrik E.
We discuss the role of coordination as a direct learning objective in multi-agent reinforcement learning (MARL) domains. To this end, we present a novel means of quantifying coordination in multi-agent systems, and discuss the implications of using such a measure to optimize coordinated agent policies. This concept has important implications for adversary-aware RL, which we take to be a sub-domain of multi-agent learning.
Unity tweaks AI training tools, makes bid for academic respect
Unity Technologies on Monday released version 0.5 of its ML-Agents toolkit to make its Unity 3D game development platform better suited for developing and training autonomous agent code via machine learning. Initially rolled out a year ago in beta, version 0.5 comes with a few improvements. There's a wrapper for Gym (a toolkit for developing and testing reinforcement learning algorithms), support for letting agents make multiple action selections at once and for preventing agents from taking certain actions, and a refurbished set of environments called Marathon Environments. In these virtual spaces, AI researchers can teach software agents to perform certain tasks by rewarding them for correct actions. This sort of reinforcement learning can be limited to digital environments like video games or mapped to software-driven machines in the real world. Through its latest code update, Unity is making the case for Unity 3D as a key tool for AI research, a goal that company code boffins describe in a preprint paper titled, "Unity: A General Platform for Intelligent Agents."
The Convergence of Iterative Delegations in Liquid Democracy
Escoffier, Bruno, Gilbert, Hugo, Pass-Lanneau, Adèle
Liquid democracy is a collective decision making paradigm which lies between direct and representative democracy. One main feature of liquid democracy is the concept of transitive delegations. Indeed, in this setting each voter may decide to vote directly or to delegate her vote to a representative, also called proxy. In liquid democracy this proxy can in turn delegate her vote and the votes that have been delegated to her to another proxy. As a result, a voter who decides to vote has a weight corresponding to the number of people she represents, i.e., herself and the voters who directly or indirectly delegated to her.
Coordinated Heterogeneous Distributed Perception based on Latent Space Representation
Korthals, Timo, Leitner, Jürgen, Rückert, Ulrich
Abstract-- We investigate a reinforcement approach for distributed sensing based on the latent space derived from multimodal deep generative models. Our contribution provides insights to the following benefits: Detections can be exchanged effectively between robots equipped with unimodal sensors due to a shared latent representation of information that is trained by a Variational Auto Encoder (VAE). Sensor-fusion can be applied asynchronously due to the generative feature of the VAE. Deep Q-Networks (DQNs) are trained to minimize uncertainty in latent space by coordinating robots to a Point-of- Interest (PoI) where their sensor modality can provide beneficial information about the PoI. Additionally, we show that the decrease in uncertainty can be defined as the direct reward signal for training the DQN.
Detecting Intentions of Vulnerable Road Users Based on Collective Intelligence
Bieshaar, Maarten, Reitberger, Günther, Zernetsch, Stefan, Sick, Bernhard, Fuchs, Erich, Doll, Konrad
Vulnerable road users (VRUs, i.e. cyclists and pedestrians) will play an important role in future traffic. To avoid accidents and achieve a highly efficient traffic flow, it is important to detect VRUs and to predict their intentions. In this article a holistic approach for detecting intentions of VRUs by cooperative methods is presented. The intention detection consists of basic movement primitive prediction, e.g. standing, moving, turning, and a forecast of the future trajectory. Vehicles equipped with sensors, data processing systems and communication abilities, referred to as intelligent vehicles, acquire and maintain a local model of their surrounding traffic environment, e.g. crossing cyclists. Heterogeneous, open sets of agents (cooperating and interacting vehicles, infrastructure, e.g. cameras and laser scanners, and VRUs equipped with smart devices and body-worn sensors) exchange information forming a multi-modal sensor system with the goal to reliably and robustly detect VRUs and their intentions under consideration of real time requirements and uncertainties. The resulting model allows to extend the perceptual horizon of the individual agent beyond their own sensory capabilities, enabling a longer forecast horizon. Concealments, implausibilities and inconsistencies are resolved by the collective intelligence of cooperating agents. Novel techniques of signal processing and modelling in combination with analytical and learning based approaches of pattern and activity recognition are used for detection, as well as intention prediction of VRUs. Cooperation, by means of probabilistic sensor and knowledge fusion, takes place on the level of perception and intention recognition. Based on the requirements of the cooperative approach for the communication a new strategy for an ad hoc network is proposed.
A Continuous Information Gain Measure to Find the Most Discriminatory Problems for AI Benchmarking
Stephenson, Matthew, Anderson, Damien, Khalifa, Ahmed, Levine, John, Renz, Jochen, Togelius, Julian, Salge, Christoph
This paper introduces an information-theoretic method for selecting a small subset of problems which gives us the most information about a group of problem-solving algorithms. This method was tested on the games in the General Video Game AI (GVGAI) framework, allowing us to identify a smaller set of games that still gives a large amount of information about the game-playing agents. This approach can be used to make agent testing more efficient in the future. We can achieve almost as good discriminatory accuracy when testing on only a handful of games as when testing on more than a hundred games, something which is often computationally infeasible. Furthermore, this method can be extended to study the dimensions of effective variance in game design between these games, allowing us to identify which games differentiate between agents in the most complementary ways. As a side effect of this investigation, we provide an up-to-date comparison on agent performance for all GVGAI games, and an analysis of correlations between scores and win-rates across both games and agents.