Agents
Empirical validation of network learning with taxi GPS data from Wuhan, China
Xu, Susan Jia, Xie, Qian, Chow, Joseph Y. J., Liu, Xintao
Many studies have illustrated the import ance to accurately and precisely measure the attributes of an urban transport system. Due to the rise of Big Data and Internet of Things, there are numerous machine learning methods to measur e attributes of the transport system . Chow ( 1) provides an overview of these techniques including several appl ications like Allahviranloo and Recker ( 2) for activity pattern prediction; Cai et al. ( 3) for short - term traffic forecasting; Luque - Baena et al. ( 4) for vehicle detection; Lv et al. ( 5) for t raffic flow prediction; and Ma et al. ( 6) for network congestion prediction. However, generic machine learning techniques are not specifically designed to exploit the unique structure of urban transport networks. As a result, in recent years a theory of inverse problems (see 7) have emerged to capture network structure, dubbed " inverse transportation problems " by Xu et al. ( 8).
A perspective on multi-agent communication for information fusion
Saha, Homagni, Venkataraman, Vijay, Speranzon, Alberto, Sarkar, Soumik
Collaborative decision making in multi-agent systems typically requires a predefined communication protocol among agents. Usually, agent-level observations are locally processed and information is exchanged using the predefined protocol, enabling the team to perform more efficiently than each agent operating in isolation. In this work, we consider the situation where agents, with complementary sensing modalities must co-operate to achieve a common goal/task by learning an efficient communication protocol. We frame the problem within an actor-critic scheme, where the agents learn optimal policies in a centralized fashion, while taking action in a distributed manner. We provide an interpretation of the emergent communication between the agents. We observe that the information exchanged is not just an encoding of the raw sensor data but is, rather, a specific set of directive actions that depend on the overall task. Simulation results demonstrate the interpretability of the learnt communication in a variety of tasks.
Duty to Warn in Strategic Games
The paper investigates the second-order blameworthiness or duty to warn modality "one coalition knew how another coalition could have prevented an outcome". The main technical result is a sound and complete logical system that describes the interplay between the distributed knowledge and the duty to warn modalities.
Bringing autonomous systems to engineers: Taking a leap from the digital world of games to the real world - The Official Microsoft Blog
Imagine an autonomous vehicle navigating a smoke-filled mine looking for survivors, personal belongings or any other clues to find anyone who might be alive. It identifies objects it sees and decides which paths to take first. As it reaches the limit of where it can explore, a drone sitting on the vehicle flies off to explore the hard-to-reach corners of the mine. All of this is done without any communication with the outside world. Team Explorer from Carnegie Mellon University and Oregon State University did exactly this to win the first event of the Defense Advanced Research Projects Agency (DARPA) Subterranean Challenge.
How Do We Know What Superintelligent AI Will Do?
Unpredictability is an intuitively familiar concept. We can usually predict the outcome of common physical processes without knowing specific behavior of particular atoms, just as we can typically predict the overall behavior of the intelligent system without knowing specific intermediate steps. Rahwan and Cebrian observe that "… complex AI agents often exhibit inherent unpredictability: they demonstrate emergent behaviors that are impossible to predict with precision--even by their own programmers. These behaviors manifest themselves only through interaction with the world and with other agents in the environment… In fact, Alan Turing and Alonzo Church showed the fundamental impossibility of ensuring an algorithm fulfills certain properties without actually running said algorithm. There are fundamental theoretical limits to our ability to verify that a particular piece of code will always satisfy desirable properties unless we execute the code, and observe its behavior."
Convergence of Learning Dynamics in Stackelberg Games
Fiez, Tanner, Chasnov, Benjamin, Ratliff, Lillian J.
This paper investigates the convergence of learning dynamics in Stackelberg games. In the class of games we consider, there is a hierarchical game being played between a leader and a follower with continuous action spaces. We establish a number of connections between the Nash and Stackelberg equilibrium concepts and characterize conditions under which attracting critical points of simultaneous gradient descent are Stackelberg equilibria in zero-sum games. Moreover, we show that the only stable critical points of the Stackelberg gradient dynamics are Stackelberg equilibria in zero-sum games. Using this insight, we develop a gradient-based update for the leader while the follower employs a best response strategy for which each stable critical point is guaranteed to be a Stackelberg equilibrium in zero-sum games. As a result, the learning rule provably converges to a Stackelberg equilibria given an initialization in the region of attraction of a stable critical point. We then consider a follower employing a gradient-play update rule instead of a best response strategy and propose a two-timescale algorithm with similar asymptotic convergence guarantees. For this algorithm, we also provide finite-time high probability bounds for local convergence to a neighborhood of a stable Stackelberg equilibrium in general-sum games. Finally, we present extensive numerical results that validate our theory, provide insights into the optimization landscape of generative adversarial networks, and demonstrate that the learning dynamics we propose can effectively train generative adversarial networks.
To Populate is To Regulate
We examine the effects of instantiating Lewis signaling games within a population of speaker and listener agents with the aim of producing a set of general and robust representations of unstructured pixel data. Preliminary experiments suggest that the set of representations associated with languages generated within a population outperform those generated between a single speaker-listener pair on this objective, making a case for the adoption of population-based approaches in emergent communication studies. Furthermore, post-hoc analysis reveals that population-based learning induces a number of novel factors to the conventional emergent communication setup, inviting a wide range of future research questions regarding communication dynamics and the flow of information within them.
Robot navigation and target capturing using nature-inspired approaches in a dynamic environment
Verma, Devansh, Saxena, Priyansh, Tiwari, Ritu
Path Planning and target searching in a three-dimensional environment is a challenging task in the field of robotics. It is an optimization problem as the path from source to destination has to be optimal. This paper aims to generate a collision-free trajectory in a dynamic environment. The path planning problem has sought to be of extreme importance in the military, search and rescue missions and in life-saving tasks. During its operation, the unmanned air vehicle operates in a hostile environment, and faster replanning is needed to reach the target as optimally as possible. This paper presents a novel approach of hierarchical planning using multiresolution abstract levels for faster replanning. Economic constraints like path length, total path planning time and the number of turns are taken into consideration that mandate the use of cost functions. Experimental results show that the hierarchical version of GSO gives better performance compared to the BBO, IWO and their hierarchical versions.
OntoScene, A Logic-based Scene Interpreter: Implementation and Application in the Rock Art Domain
Briola, Daniela, Mascardi, Viviana, Gioseffi, Massimiliano
OntoScene exploits ontologies for representing knowledge and Prolog for specifying the interpretation rules that domain experts may adopt, and for implementing the SceneInterpreter engine. Ontologies allow the designer to formalize the domain in a reusable way, and make the system modular and interoperable with existing multiagent systems, while Prolog provides a solid basis to define complex rules of interpretation in a way that can be affordable even for people with no background in Computational Logics. The domain selected for experimenting OntoScene is that of prehistoric rock art, which provides us with a fascinating and challenging testbed. Under consideration in Theory and Practice of Logic Programming (TPLP) KEYWORDS: Prolog; Ontologies; Multiagent Systems; Visual Languages; Scene Interpretation1 Introduction Human perception of complex visual scenes has been studied for a long time in psychology and neuroscience (Kondo et al. 2017): according to the seminal work on "high-level scene perception" (Henderson and Hollingworth 1999), besides low-level or early vision, concerned with extraction of physical properties such as depth, color, and texture from an image (Marr 1982), and intermediate-level vision, concerned with extraction of shape and spatial relations that can be determined without regard to meaning (Ullman 1996), a further level of vision is required to perceive and understand a scene: high-level vision concerns the mapping from visual representations to meaning and includes [...] the identification of objects and scenes. In their recent studies, Kveraga, Bar, and Baldassano (Kveraga and Bar 2014; Baldassano 2015) demonstrate that the brain has regions related to higher-order properties like overall geometry, arXiv:1911.04863v1
Experience Sharing Between Cooperative Reinforcement Learning Agents
Souza, Lucas Oliveira, Ramos, Gabriel de Oliveira, Ralha, Celia Ghedini
The idea of experience sharing between cooperative agents naturally emerges from our understanding of how humans learn. Our evolution as a species is tightly linked to the ability to exchange learned knowledge with one another. It follows that experience sharing (ES) between autonomous and independent agents could become the key to accelerate learning in cooperative multiagent settings. We investigate if randomly selecting experiences to share can increase the performance of deep reinforcement learning agents, and propose three new methods for selecting experiences to accelerate the learning process. Firstly, we introduce Focused ES, which prioritizes unexplored regions of the state space. Secondly, we present Prioritized ES, in which temporal-difference error is used as a measure of priority. Finally, we devise Focused Prioritized ES, which combines both previous approaches. The methods are empirically validated in a control problem. While sharing randomly selected experiences between two Deep Q-Network agents shows no improvement over a single agent baseline, we show that the proposed ES methods can successfully outperform the baseline. In particular, the Focused ES accelerates learning by a factor of 2, reducing by 51% the number of episodes required to complete the task.