Agents
Tree of Knowledge: an Online Platform for Learning the Behaviour of Complex Systems
Many social sciences such as psychology and economics try to learn the behaviour of complex agents such as humans, organisations and countries. The current statistical methods used for learning this behaviour try to infer generally valid behaviour, but can only learn from one type of study at a time. Furthermore, only data from carefully designed studies can be used, as the phenomenon of interest has to be isolated and confounding factors accounted for. These restrictions limit the robustness and accuracy of insights that can be gained from social/economic systems. Here we present the online platform TreeOfKnowledge which implements a new methodology specifically designed for learning complex behaviours from complex systems: agent-based behaviour learning. With agent-based behaviour learning it is possible to gain more accurate and robust insights as it does not have the restriction of conventional statistics. It learns agent behaviour from many heterogenous datasets and can learn from these datasets even if the phenomenon of interest is not directly observed, but appears deep within complex systems. This new methodology shows how the internet and advances in computational power allow for more accurate and powerful mathematical models.
Cognitive Homeostatic Agents
Human brain has been used as an inspiration for building autonomous agents, but it is not obvious what level of computational description of the brain one should use. This has led to overly opinionated symbolic approaches and overly unstructured connectionist approaches. We propose that using homeostasis as the computational description provides a good compromise. Similar to how physiological homeostasis is the regulation of certain homeostatic variables, cognition can be interpreted as the regulation of certain 'cognitive homeostatic variables'. We present an outline of a Cognitive Homeostatic Agent, built as a hierarchy of physiological and cognitive homeostatic subsystems and describe structures and processes to guide future exploration. We expect this to be a fruitful line of investigation towards building sophisticated artificial agents that can act flexibly in complex environments, and produce behaviors indicating planning, thinking and feelings.
Global Cooperation & Guidelines Will Let Countries Use AI For Good
Yoshua Bengio is one of the world's leading experts in artificial intelligence and deep learning. Also known as the father of deep learning, he says that for the world to change for the better with AI, a global shift in how organizations and governments share their research needs to come. In many countries, private companies, government entities, and academic institutions conduct AI research. These places must foster a global culture of open science. These research places the need to rethink how to encourage the development of impactful artificial intelligence.
Scalable Multiagent Driving Policies For Reducing Traffic Congestion
Cui, Jiaxun, Macke, William, Yedidsion, Harel, Goyal, Aastha, Urielli, Daniel, Stone, Peter
Traffic congestion is a major challenge in modern urban settings. The industry-wide development of autonomous and automated vehicles (AVs) motivates the question of how can AVs contribute to congestion reduction. Past research has shown that in small scale mixed traffic scenarios with both AVs and human-driven vehicles, a small fraction of AVs executing a controlled multiagent driving policy can mitigate congestion. In this paper, we scale up existing approaches and develop new multiagent driving policies for AVs in scenarios with greater complexity. We start by showing that a congestion metric used by past research is manipulable in open road network scenarios where vehicles dynamically join and leave the road. We then propose using a different metric that is robust to manipulation and reflects open network traffic efficiency. Next, we propose a modular transfer reinforcement learning approach, and use it to scale up a multiagent driving policy to outperform human-like traffic and existing approaches in a simulated realistic scenario, which is an order of magnitude larger than past scenarios (hundreds instead of tens of vehicles). Additionally, our modular transfer learning approach saves up to 80% of the training time in our experiments, by focusing its data collection on key locations in the network. Finally, we show for the first time a distributed multiagent policy that improves congestion over human-driven traffic. The distributed approach is more realistic and practical, as it relies solely on existing sensing and actuation capabilities, and does not require adding new communication infrastructure.
Russell Wilson wants to play for Seahawks but would be willing to be dealt to these teams, agent says
Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. Russell Wilson still wants to play for the Seattle Seahawks, his agent said Thursday amid a new report detailing a potential growing fracture between the two sides. Wilson's agent Mark Rodgers told ESPN if there was a trade coming down the line Wilson would only want to play for a handful of teams. Rodgers told ESPN that Wilson's trade list would include the Dallas Cowboys, New Orleans Saints, Las Vegas Raiders and Chicago Bears.
AGENT: A Benchmark for Core Psychological Reasoning
Shu, Tianmin, Bhandwaldar, Abhishek, Gan, Chuang, Smith, Kevin A., Liu, Shari, Gutfreund, Dan, Spelke, Elizabeth, Tenenbaum, Joshua B., Ullman, Tomer D.
For machine agents to successfully interact with humans in real-world settings, they will need to develop an understanding of human mental life. Intuitive psychology, the ability to reason about hidden mental variables that drive observable actions, comes naturally to people: even pre-verbal infants can tell agents from objects, expecting agents to act efficiently to achieve goals given constraints. Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning. Inspired by cognitive development studies on intuitive psychology, we present a benchmark consisting of a large dataset of procedurally generated 3D animations, AGENT (Action, Goal, Efficiency, coNstraint, uTility), structured around four scenarios (goal preferences, action efficiency, unobserved constraints, and cost-reward trade-offs) that probe key concepts of core intuitive psychology. We validate AGENT with human-ratings, propose an evaluation protocol emphasizing generalization, and compare two strong baselines built on Bayesian inverse planning and a Theory of Mind neural network. Our results suggest that to pass the designed tests of core intuitive psychology at human levels, a model must acquire or have built-in representations of how agents plan, combining utility computations and core knowledge of objects and physics.
Multi-Agent Deep Reinforcement Learning in 13 Lines of Code Using PettingZoo
This tutorial provides a simple introduction to using multi-agent reinforcement learning, assuming a little experience in machine learning and knowledge of Python. Reinforcement stems from using machine learning to optimally control an agent in an environment. It works by learning a policy, a function that maps an observation obtained from its environment to an action. Policy functions are typically deep neural networks, which gives rise to the name "deep reinforcement learning." The goal of reinforcement learning is to learn an optimal policy, a policy that achieves the maximum expected reward from the environment when acting.
A Sufficient Statistic for Influence in Structured Multiagent Environments
Oliehoek, Frans (Delft University of Technology) | Witwicki, Stefan (Nissan) | Kaelbling, Leslie (MIT)
Making decisions in complex environments is a key challenge in artificial intelligence (AI). Situations involving multiple decision makers are particularly complex, leading to computational intractability of principled solution methods. A body of work in AI has tried to mitigate this problem by trying to distill interaction to its essence: how does the policy of one agent influence another agent? If we can find more compact representations of such influence, this can help us deal with the complexity, for instance by searching the space of influences rather than the space of policies. However, so far these notions of influence have been restricted in their applicability to special cases of interaction. In this paper we formalize influence-based abstraction (IBA), which facilitates the elimination of latent state factors without any loss in value, for a very general class of problems described as factored partially observable stochastic games (fPOSGs). On the one hand, this generalizes existing descriptions of influence, and thus can serve as the foundation for improvements in scalability and other insights in decision making in complex multiagent settings. On the other hand, since the presence of other agents can be seen as a generalization of single agent settings, our formulation of IBA also provides a sufficient statistic for decision making under abstraction for a single agent. We also give a detailed discussion of the relations to such previous works, identifying new insights and interpretations of these approaches. In these ways, this paper deepens our understanding of abstraction in a wide range of sequential decision making settings, providing the basis for new approaches and algorithms for a large class of problems.
Credit Assignment with Meta-Policy Gradient for Multi-Agent Reinforcement Learning
Shao, Jianzhun, Zhang, Hongchang, Jiang, Yuhang, He, Shuncheng, Ji, Xiangyang
Reward decomposition is a critical problem in centralized training with decentralized execution (CTDE) paradigm for multi-agent reinforcement learning. To take full advantage of global information, which exploits the states from all agents and the related environment for decomposing Q values into individual credits, we propose a general meta-learning-based Mixing Network with Meta Policy Gradient (MNMPG) framework to distill the global hierarchy for delicate reward decomposition. The excitation signal for learning global hierarchy is deduced from the episode reward difference between before and after "exercise updates" through the utility network. Our method is generally applicable to the CTDE method using a monotonic mixing network. Experiments on the StarCraft II micromanagement benchmark demonstrate that our method just with a simple utility network is able to outperform the current state-of-the-art MARL algorithms on 4 of 5 super hard scenarios. Better performance can be further achieved when combined with a role-based utility network.
Learning Emergent Discrete Message Communication for Cooperative Reinforcement Learning
Li, Sheng, Zhou, Yutai, Allen, Ross, Kochenderfer, Mykel J.
Communication is a important factor that enables agents work cooperatively in multi-agent reinforcement learning (MARL). Most previous work uses continuous message communication whose high representational capacity comes at the expense of interpretability. Allowing agents to learn their own discrete message communication protocol emerged from a variety of domains can increase the interpretability for human designers and other agents.This paper proposes a method to generate discrete messages analogous to human languages, and achieve communication by a broadcast-and-listen mechanism based on self-attention. We show that discrete message communication has performance comparable to continuous message communication but with much a much smaller vocabulary size.Furthermore, we propose an approach that allows humans to interactively send discrete messages to agents.