Goto

Collaborating Authors

 Agents


Correcting Experience Replay for Multi-Agent Communication

arXiv.org Artificial Intelligence

We consider the problem of learning to communicate using multi-agent reinforcement learning (MARL). A common approach is to learn off-policy, using data sampled from a replay buffer. However, messages received in the past may not accurately reflect the current communication policy of each agent, and this complicates learning. We therefore introduce a 'communication correction' which accounts for the non-stationarity of observed communication induced by multi-agent learning. It works by relabelling the received message to make it likely under the communicator's current policy, and thus be a better reflection of the receiver's current environment. To account for cases in which agents are both senders and receivers, we introduce an ordered relabelling scheme. Our correction is computationally efficient and can be integrated with a range of off-policy algorithms. It substantially improves the ability of communicating MARL systems to learn across a variety of cooperative and competitive tasks.


Reinforcement Learning of Simple Indirect Mechanisms

arXiv.org Artificial Intelligence

Over the last fifty years, a large body of research in microeconomics has introduced many different mechanisms for resource allocation. Despite the wide variety of available options, "simple" mechanisms such as posted price and serial dictatorship are often preferred for practical applications, including housing allocation [Abdulkadiroğlu and Sönmez, 1998], online procurement [Badanidiyuru et al., 2012], or allocation of medical appointments [Klaus and Nichifor, 2019]. There has been considerable interest in formalizing different notions of simplicity. Li [2017] identifies mechanisms that are particularly simple from a strategic perspective, introducing the concept of obviously strategyproof mechanisms; under obviously strategyproof mechanisms, it is obvious that an agent cannot profit by trying to game the system, as even the worst possible final outcome from behaving truthfully is at least as good as the best possible outcome from any other strategy. Pycia and Troyan [2019] introduce the still stronger concept of strongly obviously strategyproof (SOSP) mechanisms, and show that this class can essentially be identified with sequential price mechanisms, where agents are visited in turn and offered a choice from a menu of options (which may or may not include transfers). SOSP mechanisms are ones in which an agent is not even required to consider her future (truthful) actions to understand that the mechanism is obviously strategyproof.


Public Announcement Logic in HOL

arXiv.org Artificial Intelligence

A shallow semantical embedding for public announcement logic with relativized common knowledge is presented. This embedding enables the first-time automation of this logic with off-the-shelf theorem provers for classical higher-order logic. It is demonstrated (i) how meta-theoretical studies can be automated this way, and (ii) how non-trivial reasoning in the target logic (public announcement logic), required e.g. to obtain a convincing encoding and automation of the wise men puzzle, can be realized. Key to the presented semantical embedding -- in contrast, e.g., to related work on the semantical embedding of normal modal logics -- is that evaluation domains are modeled explicitly and treated as additional parameter in the encodings of the constituents of the embedded target logic, while they were previously implicitly shared between meta logic and target logic.


AI improves customer experience, call center efficiency

#artificialintelligence

Artificial intelligence serves a number of purposes in contact centers. It can automate routine processes, provide live chat in the form of virtual agents or chatbots to address customer queries, help deliver personalized experiences and provide predictive analytics, among other things. Each of these AI functions helps enhance both the employee and customer experience, and many CX leaders are taking a closer look at how it can help in their companies. The driving force behind this movement is the increase in contact center interaction volume as COVID-19 shows no end in sight, moving customers from physical to virtual buying patterns and requiring help from live customer service agents. CX leaders have reported many changes in their contact centers, including the need for more agents to support customer demand over both the phone and digital channels, and replacing in-person experiences with virtual ones -- driving even more traffic to the contact center.


Student-Initiated Action Advising via Advice Novelty

arXiv.org Machine Learning

Action advising is a knowledge exchange mechanism between peers, namely student and teacher, that can help tackle exploration and sample inefficiency problems in deep reinforcement learning. Due to the practical limitations in peer-to-peer communication and the negative implications of over-advising, the peer responsible for initiating these interactions needs to do so only when it's most adequate to exchange advice. Most recently, student-initiated techniques that utilise state novelty and uncertainty estimations have obtained promising results. However, these estimations have several weaknesses, such as having no information regarding the characteristics of convergence and being subject to delays that occur in the presence of experience replay dynamics. We propose a student-initiated action advising algorithm that alleviates these shortcomings. Specifically, we employ Random Network Distillation (RND) to measure the novelty of an advice, for the student to determine whether to proceed with the request; furthermore, we perform RND updates only for the advised states to ensure that the student's convergence will not prevent it from utilising the teacher's knowledge at any stage of learning. Experiments in GridWorld and simplified versions of five Atari games show that our approach can perform on par with the state-of-the-art and demonstrate significant advantages in the scenarios where the existing methods are prone to fail.


Multi-agent Social Reinforcement Learning Improves Generalization

arXiv.org Artificial Intelligence

Social learning is a key component of human and animal intelligence. By taking cues from the behavior of experts in their environment, social learners can acquire sophisticated behavior and rapidly adapt to new circumstances. This paper investigates whether independent reinforcement learning (RL) agents in a multi-agent environment can use social learning to improve their performance using cues from other agents. We find that in most circumstances, vanilla model-free RL agents do not use social learning, even in environments in which individual exploration is expensive. We analyze the reasons for this deficiency, and show that by introducing a model-based auxiliary loss we are able to train agents to lever-age cues from experts to solve hard exploration tasks. The generalized social learning policy learned by these agents allows them to not only outperform the experts with which they trained, but also achieve better zero-shot transfer performance than solo learners when deployed to novel environments with experts. In contrast, agents that have not learned to rely on social learning generalize poorly and do not succeed in the transfer task. Further,we find that by mixing multi-agent and solo training, we can obtain agents that use social learning to out-perform agents trained alone, even when experts are not avail-able. This demonstrates that social learning has helped improve agents' representation of the task itself. Our results indicate that social learning can enable RL agents to not only improve performance on the task at hand, but improve generalization to novel environments.


Mediating Artificial Intelligence Developments through Negative and Positive Incentives

arXiv.org Artificial Intelligence

The field of Artificial Intelligence (AI) is going through a period of great expectations, introducing a certain level of anxiety in research, business and also policy. This anxiety is further energised by an AI race narrative that makes people believe they might be missing out. Whether real or not, a belief in this narrative may be detrimental as some stake-holders will feel obliged to cut corners on safety precautions, or ignore societal consequences just to "win". Starting from a baseline model that describes a broad class of technology races where winners draw a significant benefit compared to others (such as AI advances, patent race, pharmaceutical technologies), we investigate here how positive (rewards) and negative (punishments) incentives may beneficially influence the outcomes. We uncover conditions in which punishment is either capable of reducing the development speed of unsafe participants or has the capacity to reduce innovation through over-regulation. Alternatively, we show that, in several scenarios, rewarding those that follow safety measures may increase the development speed while ensuring safe choices. Moreover, in {the latter} regimes, rewards do not suffer from the issue of over-regulation as is the case for punishment. Overall, our findings provide valuable insights into the nature and kinds of regulatory actions most suitable to improve safety compliance in the contexts of both smooth and sudden technological shifts.


Entropy Regularization for Mean Field Games with Learning

arXiv.org Machine Learning

Entropy regularization has been extensively adopted to improve the efficiency, the stability, and the convergence of algorithms in reinforcement learning. This paper analyzes both quantitatively and qualitatively the impact of entropy regularization for Mean Field Game (MFG) with learning in a finite time horizon. Our study provides a theoretical justification that entropy regularization yields time-dependent policies and, furthermore, helps stabilizing and accelerating convergence to the game equilibrium. In addition, this study leads to a policy-gradient algorithm for exploration in MFG. Under this algorithm, agents are able to learn the optimal exploration scheduling, with stable and fast convergence to the game equilibrium.


Emotion in Future Intelligent Machines

arXiv.org Artificial Intelligence

Over the past decades, research in cognitive and affective neuroscience has emphasized that emotion is crucial for human intelligence and in fact inseparable from cognition. Concurrently, there has been a significantly growing interest in simulating and modeling emotion in robots and artificial agents. Yet, existing models of emotion and their integration in cognitive architectures remain quite limited and frequently disconnected from neuroscientific evidence. We argue that a stronger integration of emotion in robot models is critical for the design of intelligent machines capable of tackling real world problems. Drawing from current neuroscientific knowledge, we provide a set of guidelines for future research in artificial emotion and intelligent machines more generally. Emotion is critical for the flexible, intelligent behavior of biological organisms. Accordingly, multiple attempts to model emotion in robots and artificial agents have been described in the last decades. Yet, how emotion is modeled and how it interfaces with "cognitive architectures" remains poorly developed.


Complexity and Algorithms for Exploiting Quantal Opponents in Large Two-Player Games

arXiv.org Artificial Intelligence

Solution concepts of traditional game theory assume entirely rational players; therefore, their ability to exploit subrational opponents is limited. One type of subrationality that describes human behavior well is the quantal response. While there exist algorithms for computing solutions against quantal opponents, they either do not scale or may provide strategies that are even worse than the entirely-rational Nash strategies. This paper aims to analyze and propose scalable algorithms for computing effective and robust strategies against a quantal opponent in normal-form and extensive-form games. Our contributions are: (1) we define two different solution concepts related to exploiting quantal opponents and analyze their properties; (2) we prove that computing these solutions is computationally hard; (3) therefore, we evaluate several heuristic approximations based on scalable counterfactual regret minimization (CFR); and (4) we identify a CFR variant that exploits the bounded opponents better than the previously used variants while being less exploitable by the worst-case perfectly-rational opponent.