Goto

Collaborating Authors

 Agents


Synthesis and Properties of Optimally Value-Aligned Normative Systems

Journal of Artificial Intelligence Research

The value alignment problem is concerned with the design of systems that provably abide by our human values. One approach to this challenge is through the leverage of prescriptive norms that, if carefully designed, are able to steer a multiagent system away from harmful outcomes and towards more beneficial ones. In this work, we first present a general methodology for the automated synthesis of value aligned normative systems, based on a consequentialist view of values. In the second part, we provide analytical tools to examine such value aligned normative systems, namely the Shapley value of individual norms and the compatibility of several values under a fixed set of norms. We illustrate all of our contributions with a running example of a society of agents where taxes are collected and redistributed according to a set of parametrised norms.


Computational Aspects of Cooperative Game Theory (Synthesis Lectures on Artificial Inetlligence and Machine Learning): Chalkiadakis, Georgios, Elkind, Edith, Wooldridge, Michael: 9781608456529: Amazon.com: Books

#artificialintelligence

This manuscript was a pleasure to discover, and a pleasure to read -- a broad, but succinct, overview of work in computational cooperative game theory. I will certainly use this text with my own students, both within courses and to provide comprehensive background for students in my research group. The authors have made a substantial contribution to the multiagent systems and algorithmic game theory communities.


Intelligent Physical Attack Against Mobile Robots With Obstacle-Avoidance

arXiv.org Artificial Intelligence

The security issue of mobile robots has attracted considerable attention in recent years. In this paper, we propose an intelligent physical attack to trap mobile robots into a preset position by learning the obstacle-avoidance mechanism from external observation. The salient novelty of our work lies in revealing the possibility that physical-based attacks with intelligent and advanced design can present real threats, while without prior knowledge of the system dynamics or access to the internal system. This kind of attack cannot be handled by countermeasures in traditional cyberspace security. To practice, the cornerstone of the proposed attack is to actively explore the complex interaction characteristic of the victim robot with the environment, and learn the obstacle-avoidance knowledge exhibited in the limited observations of its behaviors. Then, we propose shortest-path and hands-off attack algorithms to find efficient attack paths from the tremendous motion space, achieving the driving-to-trap goal with low costs in terms of path length and activity period, respectively. The convergence of the algorithms is proved and the attack performance bounds are further derived. Extensive simulations and real-life experiments illustrate the effectiveness of the proposed attack, beckoning future investigation for the new physical threats and defense on robotic systems.


Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits

arXiv.org Artificial Intelligence

We study the Improving Multi-Armed Bandit (IMAB) problem, where the reward obtained from an arm increases with the number of pulls it receives. This model provides an elegant abstraction for many real-world problems in domains such as education and employment, where decisions about the distribution of opportunities can affect the future capabilities of communities and the disparity between them. A decision-maker in such settings must consider the impact of her decisions on future rewards in addition to the standard objective of maximizing her cumulative reward at any time. In many of these applications, the time horizon is unknown to the decision-maker beforehand, which motivates the study of the IMAB problem in the technically more challenging horizon-unaware setting. We study the tension that arises between two seemingly conflicting objectives in the horizon-unaware setting: a) maximizing the cumulative reward at any time based on current rewards of the arms, and b) ensuring that arms with better long-term rewards get sufficient opportunities even if they initially have low rewards. We show that, surprisingly, the two objectives are aligned with each other in this setting. Our main contribution is an anytime algorithm for the IMAB problem that achieves the best possible cumulative reward while ensuring that the arms reach their true potential given sufficient time. Our algorithm mitigates the initial disparity due to lack of opportunity and continues pulling an arm till it stops improving. We prove the optimality of our algorithm by showing that a) any algorithm for the IMAB problem, no matter how utilitarian, must suffer $\Omega(T)$ policy regret and $\Omega(k)$ competitive ratio with respect to the optimal offline policy, and b) the competitive ratio of our algorithm is $O(k)$.


Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center

arXiv.org Artificial Intelligence

This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Traditional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system from moderate-to large-scale. Experiments on realistic testbeds show that the independent and "selfish" load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings. Additionally, the potential difficulties of MARL methods for network load balancing are analysed, which helps to draw the attention of the learning and network communities to such challenges.


Towards Informed Design and Validation Assistance in Computer Games Using Imitation Learning

arXiv.org Artificial Intelligence

In games, as in and many other domains, design validation and testing is a huge challenge as systems are growing in size and manual testing is becoming infeasible. This paper proposes a new approach to automated game validation and testing. Our method leverages a data-driven imitation learning technique, which requires little effort and time and no knowledge of machine learning or programming, that designers can use to efficiently train game testing agents. We investigate the validity of our approach through a user study with industry experts. The survey results show that our method is indeed a valid approach to game validation and that data-driven programming would be a useful aid to reducing effort and increasing quality of modern playtesting. The survey also highlights several open challenges. With the help of the most recent literature, we analyze the identified challenges and propose future research directions suitable for supporting and maximizing the utility of our approach.


Learning in Stackelberg Games with Non-myopic Agents

arXiv.org Artificial Intelligence

Stackelberg games are a canonical model for strategic principal-agent interactions. Consider a defense system that distributes its security resources across high-risk targets prior to attacks being executed; or a tax policymaker who sets rules on when audits are triggered prior to seeing filed tax reports; or a seller who chooses a price prior to knowing a customer's proclivity to buy. In each of these scenarios, a principal first selects an action X and then an agent reacts with an action Y, where X and Y are the principal's and agent's action spaces, respectively. In the examples above, agent actions correspond to which target to attack, how much tax to pay to evade an audit, and how much to purchase, respectively. Typically, the principal wants an that maximizes their payoff when the agent plays a best response = br(); such a pair (,) is a Stackelberg equilibrium. By committing to a strategy, the principal can guarantee they achieve a higher payoff than in the fixed point equilibrium of the corresponding simultaneous-play game. However, finding such a strategy requires knowledge of the agent's payoff function. When faced with unknown agent payoffs, the principal can attempt to learn a best response via repeated interactions with the agent. If a (naïve) agent is unaware that such learning occurs and always plays a best response, the principal can use classical online learning approaches to optimize their own payoff in the stage game.


Using Affect as a Communication Modality to Improve Human-Robot Communication in Robot-Assisted Search and Rescue Scenarios

arXiv.org Artificial Intelligence

Abstract--Emotions can provide a natural communication modality to complement the existing multi-modal capabilities of social robots, such as text and speech, in many domains. We conducted three online studies with 112, 223 and 151 participants to investigate the benefits of using emotions as a communication modality for Search And Rescue (SAR) robots. In the first experiment, we investigated the feasibility of conveying information related to SAR situations through robots' emotions, resulting in mappings from SAR situations to emotions. The second study used Affect Control Theory as an alternative method for deriving such mappings. This method is more flexible, e.g. In the third experiment, we created affective expressions for an appearance-constrained outdoor field research robot using LEDs as an expressive channel. Using these affective expressions in a variety of simulated SAR situations, we evaluated the effect of these expressions on participants' (adopting the role of rescue workers) situational awareness. Our results and proposed methodologies provide (a) insights on how emotions could help conveying messages in the context of SAR, and (b) evidence on the effectiveness of adding emotions as a communication modality in a (simulated) SAR communication context. These situations may happen due to natural or robots to target SAR areas might be more time-efficient man-made [2] causes and require an immediate response, than deploying human rescue workers (thus increasing the as time is a key element for the success of SAR operations operation's speed of progress); and (c) the limited number of [3]. Therefore, improving communication efficiency in human rescue workers since training human rescue workers SAR teams can be beneficial for the success of time-critical requires a lot of time and effort [17]. Although rescue robots have been used in SAR operations The member composition of SAR teams has been changing since early 2000s [14], they still need external help over time. First, rescue dogs were included to help to operate appropriately. To the best of our knowledge, SAR teams by taking advantage of dogs' strong sense of to date, there are no fully autonomous rescue robots or smell, which can help find victims faster [4]. More recently, robot teams that can operate in unstructured and cluttered rescue robots have become a part of SAR teams.


Balancing Consumer and Business Value of Recommender Systems: A Simulation-based Analysis

arXiv.org Artificial Intelligence

Automated recommendations can nowadays be found on many e-commerce platforms, and such recommendations can create substantial value for consumers and providers. Often, however, not all recommendable items have the same profit margin, and providers might thus be tempted to promote items that maximize their profit. In the short run, consumers might accept non-optimal recommendations, but they may lose their trust in the long run. Ultimately, this leads to the problem of designing balanced recommendation strategies, which consider both consumer and provider value and lead to sustained business success. This work proposes a simulation framework based on agent-based modeling designed to help providers explore longitudinal dynamics of different recommendation strategies. In our model, consumer agents receive recommendations from providers, and the perceived quality of the recommendations influences the consumers' trust over time. We design several recommendation strategies which either give more weight on provider profit or on consumer utility. Our simulations show that a hybrid strategy that puts more weight on consumer utility but without ignoring profitability considerations leads to the highest cumulative profit in the long run. This hybrid strategy results in a profit increase of about 20 % compared to pure consumer or profit oriented strategies. We also find that social media can reinforce the observed phenomena. In case when consumers heavily rely on social media, the cumulative profit of the best strategy further increases. To ensure reproducibility and foster future research, we publicly share our flexible simulation framework.


Safety Assessment for Autonomous Systems' Perception Capabilities

arXiv.org Artificial Intelligence

Autonomous Systems (AS) are increasingly proposed, or used, in Safety Critical (SC) applications. Many such systems make use of sophisticated sensor suites and processing to provide scene understanding which informs the AS' decision-making. The sensor processing typically makes use of Machine Learning (ML) and has to work in challenging environments, further the ML-algorithms have known limitations,e.g., the possibility of false-negatives or false-positives in object classification. The well-established safety-analysis methods developed for conventional SC systems are not well-matched to AS, ML, or the sensing systems used by AS. This paper proposes an adaptation of well-established safety-analysis methods to address the specifics of perception-systems for AS, including addressing environmental effects and the potential failure-modes of ML, and provides a rationale for choosing particular sets of guidewords, or prompts, for safety-analysis. It goes on to show how the results of the analysis can be used to inform the design and verification of the AS and illustrates the new method by presenting a partial analysis of a road vehicle. Illustrations in the paper are primarily based on optical sensing, however the paper discusses the applicability of the method to other sensing modalities and its role in a wider safety process addressing the overall capabilities of AS.