Agents
Learning Multi-agent Implicit Communication Through Actions: A Case Study in Contract Bridge, a Collaborative Imperfect-Information Game
Tian, Zheng, Zou, Shihao, Warr, Tim, Wu, Lisheng, Wang, Jun
In situations where explicit communication is limited, a human collaborator is typically able to learn to: (i) infer the meaning behind their partner's actions and (ii) balance between taking actions that are exploitative given their current understanding of the state vs. those that can convey private information about the state to their partner. The first component of this learning process has been well-studied in multi-agent systems, whereas the second --- which is equally crucial for a successful collaboration --- has not. In this work, we complete the learning process and introduce our novel algorithm, Policy-Belief-Iteration ("P-BIT"), which mimics both components mentioned above. A belief module models the other agent's private information by observing their actions, whilst a policy module makes use of the inferred private information to return a distribution over actions. They are mutually reinforced with an EM-like algorithm. We use a novel auxiliary reward to encourage information exchange by actions. We evaluate our approach on the non-competitive bidding problem from contract bridge and show that by self-play agents are able to effectively collaborate with implicit communication, and P-BIT outperforms several meaningful baselines that have been considered.
Tieto joins European AI Alliance to shape the era of artificial intelligence
Tieto announced today that it is one of the first Nordic companies to join the European AI Alliance, a newly-formed forum for artificial intelligence (AI) stakeholders to come together to push European competitiveness on AI research and development and its impacts on industry and society. The AI Alliance, established by the European Commission, brings together a diverse set of leading AI actors, including companies, consumer organizations, trade unions and other representatives of civil society bodies across Europe to share best practices. The AI Alliance aims to directly contribute to the European debate on AI and impact the Commission's AI policy-making. To achieve that, the AI Alliance works in close collaboration with the High-Level Expert Group on Artificial Intelligence (AI HLEG), a group the Commission has also established, with 52 members from academia, business and civil society such as Bayer, BMW, Bosch, Fraunhofer Institute, Google, IBM, Nokia, Siemens, Telenor and University of Oxford. The AI HLEG advises the Commission on AI's opportunities and challenges, and supports it in the implementation of the European strategy on AI.
A Distributed Reinforcement Learning Solution With Knowledge Transfer Capability for A Bike Rebalancing Problem
Rebalancing is a critical service bottleneck for many transportation services, such as Citi Bike. Citi Bike relies on manual orchestrations of rebalancing bikes between dispatchers and field agents. Motivated by such problem and the lack of smart autonomous solutions in this area, this project explored a new RL architecture called Distributed RL (DiRL) with Transfer Learning (TL) capability. The DiRL solution is adaptive to changing traffic dynamics when keeping bike stock under control at the minimum cost. DiRL achieved a 350% improvement in bike rebalancing autonomously and TL offered a 62.4% performance boost in managing an entire bike network. Lastly, a field trip to the dispatch office of Chariot, a ride-sharing service, provided insights to overcome challenges of deploying an RL solution in the real world.
Investigating Enactive Learning for Autonomous Intelligent Agents
The enactive approach to cognition is typically proposed as a viable alternative to traditional cognitive science. Enactive cognition displaces the explanatory focus from the internal representations of the agent to the direct sensorimotor interaction with its environment. In this paper, we investigate enactive learning through means of artificial agent simulations. We compare the performances of the enactive agent to an agent operating on classical reinforcement learning in foraging tasks within maze environments. The characteristics of the agents are analysed in terms of the accessibility of the environmental states, goals, and exploration/exploitation tradeoffs. We confirm that the enactive agent can successfully interact with its environment and learn to avoid unfavourable interactions using intrinsically defined goals. The performance of the enactive agent is shown to be limited by the number of affordable actions.
Multi-agent Deep Reinforcement Learning for Zero Energy Communities
Advances in renewable energy generation and introduction of the government targets to improve energy efficiency gave rise to a concept of a Zero Energy Building (ZEB). A ZEB is a building whose net energy usage over a year is zero, i.e., its energy use is not larger than its overall renewables generation. A collection of ZEBs forms a Zero Energy Community (ZEC). This paper addresses the problem of energy sharing in such a community. This is different from previously addressed energy sharing between buildings as our focus is on the improvement of community energy status, while traditionally research focused on reducing losses due to transmission and storage, or achieving economic gains. We model this problem in a multi-agent environment and propose a Deep Reinforcement Learning (DRL) based solution. Each building is represented by an intelligent agent that learns over time the appropriate behaviour to share energy. We have evaluated the proposed solution in a multi-agent simulation built using osBrain. Results indicate that with time agents learn to collaborate and learn a policy comparable to the optimal policy, which in turn improves the ZEC's energy status. Buildings with no renewables preferred to request energy from their neighbours rather than from the supply grid.
Solving Large Sequential Games with the Excessive Gap Technique
Kroer, Christian, Farina, Gabriele, Sandholm, Tuomas
There has been tremendous recent progress on equilibrium-finding algorithms for zero-sum imperfect-information extensive-form games, but there has been a puzzling gap between theory and practice. First-order methods have significantly better theoretical convergence rates than any counterfactual-regret minimization (CFR) variant. Despite this, CFR variants have been favored in practice. Experiments with first-order methods have only been conducted on small- and medium-sized games because those methods are complicated to implement in this setting, and because CFR variants have been enhanced extensively for over a decade they perform well in practice. In this paper we show that a particular first-order method, a state-of-the-art variant of the excessive gap technique---instantiated with the dilated entropy distance function---can efficiently solve large real-world problems competitively with CFR and its variants. We show this on large endgames encountered by the Libratus poker AI, which recently beat top human poker specialist professionals at no-limit Texas hold'em. We show experimental results on our variant of the excessive gap technique as well as a prior version. We introduce a numerically friendly implementation of the smoothed best response computation associated with first-order methods for extensive-form game solving. We present, to our knowledge, the first GPU implementation of a first-order method for extensive-form games. We present comparisons of several excessive gap technique and CFR variants.
Economics of Artificial Intelligence
An NBER conference on Economics of Artificial Intelligence took place in Toronto on September 13-14, 2018. Research Associates Ajay K. Agrawal, Joshua S. Gans and Avi Goldfarb of University of Toronto and Catherine Tucker of MIT organized the meeting, sponsored by the Alfred P. Sloan Foundation, CIFAR, and the Creative Destruction Lab. These researchers' papers were presented and discussed: Emilio Calvano, Vencenzo Denicolò, and Sergio Pastorello, University of Bologna, and Giacomo Calzolari, European University Institute Q-Learning to Cooperate AI algorithms are increasingly replacing human decision making in real marketplaces. To inform the debate on potential consequences, Calvano, Calzolari, Denicolò, and Pastorello run experiments with AI agents powered by reinforcement learning in controlled environments (computer simulations). In particular, the researchers study multi-agent interaction in the context of a workhorse oligopoly model: price competition with Logit demand and constant marginal costs.
Cranfield takes leading research role in autonomous systems & AI Zenoot
Cranfield University has announced plans for a world-leading Professorship in Autonomous Systems and Artificial Intelligence at the University, sponsored by BAE Systems, a technology leader in this field. The Professorship will be a research leadership role at the University, bringing together research in UAV's, Space and artificial intelligence (AI) adding to Cranfield's leading reputation in the fields of autonomous systems and AI. Advances in machine learning, high-performance computing, data science, multimodal sensing, and control are merging together to create enormous opportunities for intelligent, autonomous, or semi-autonomous systems. Such artificial intelligence systems are starting to achieve cognitive abilities such as language, attention, and creativity, promising to improve the safety and efficiency of systems for space technology and increasingly autonomous systems in aerospace and aviation. Julia Sutcliffe, Chief Technology Officer, BAE Systems Air, said: "Autonomous systems and artificial intelligence have the potential to provide a substantial positive impact upon product, service, and industrial capabilities. This prestigious appointment, in a growing and highly disruptive field, will enable BAE Systems to exploit the latest technologies in these areas to continuously improve our engineering and manufacturing processes, and give our customers a differentiating capability in the field."
Grounding the Experience of a Visual Field through Sensorimotor Contingencies
Artificial perception is traditionally handled by hand-designing task specific algorithms. However, a truly autonomous robot should develop perceptive abilities on its own, by interacting with its environment, and adapting to new situations. The sensorimotor contingencies theory proposes to ground the development of those perceptive abilities in the way the agent can actively transform its sensory inputs. We propose a sensorimotor approach, inspired by this theory, in which the agent explores the world and discovers its properties by capturing the sensorimotor regularities they induce. This work presents an application of this approach to the discovery of a so-called visual field as the set of regularities that a visual sensor imposes on a naive agent's experience. A formalism is proposed to describe how those regularities can be captured in a sensorimotor predictive model. Finally, the approach is evaluated on a simulated system coarsely inspired from the human retina. Keywords: autonomous systems, developmental robotics, sensorimotor contingencies, predictive processing, sensorimotor learning, humanlike vision 1. Introduction Autonomy in robotics relies on sensory data processing to capture information about the world and adapt to it. Although the influence of machine learning has been growing more important in the last decades, traditional approaches to this problem of data processing involve significant manual design from engineers that build the robot. Consequently the resulting techniques for artificial perception appear rigid and constrained for tractability. Each of these specialized algorithms is applicable to only a small set of tasks, with potentially limiting inbuilt biases from the designer. Corresponding author Email address: alaflaquiere@aldebaran.com (Alban Laflaquière) Preprint submitted to Neurocomputing October 5, 2018 autonomy in a robot. Instead, an autonomous robot must be able to cope with the complexity of its world, build its own way to perceive it and adapt to its variations. To address this issue, the field of developmental robotics takes inspiration from biological and cognitive development in children [4]. It proposes that an agent learns to interact with its environment, autonomously and on an ontogenic timescale.
Grounding Perception: A Developmental Approach to Sensorimotor Contingencies
Laflaquière, Alban, Hemion, Nikolas, Ortiz, Michaël Garcia, Baillie, Jean-Christophe
To date, no clear formalism for those mechanisms has arisen in the developmental robotics community. We propose predictive modeling [16], [17] as such a computational mechanism to learn sensorimotor contingencies, and thus acquire perceptive skills. In the context of SMCT, predictive models can be autonomously estimated by the agent to capture structure in the way motor commands actively transform sensory inputs, namely sensorimotor contingencies. Predictive modeling allows the incremental acquisition of skills required in developmental robotics, while providing a computational implementation of the concept of sensorimotor contingencies. Our current implementation of the formalism proposed in this paper uses a method to cluster state transition graphs, to discover densely connected subgraphs. Note that similar methods have already been proposed by others, for example in navigation tasks for the segmentation of location data into rooms [18], or for sub-goal discovery in hierarchical reinforcement learning (e.g.