Goto

Collaborating Authors

 Agents


Fair Allocation with Diminishing Differences

Journal of Artificial Intelligence Research

Ranking alternatives is a natural way for humans to explain their preferences. It is used in many settings, such as school choice, course allocations and residency matches. Without having any information on the underlying cardinal utilities, arguing about the fairness of allocations requires extending the ordinal item ranking to ordinal bundle ranking. The most commonly used such extension is stochastic dominance (SD), where a bundle X is preferred over a bundle Y if its score is better according to all additive score functions. SD is a very conservative extension, by which few allocations are necessarily fair while many allocations are possibly fair. We propose to make a natural assumption on the underlying cardinal utilities of the players, namely that the difference between two items at the top is larger than the difference between two items at the bottom. This assumption implies a preference extension which we call diminishing differences (DD), where X is preferred over Y if its score is better according to all additive score functions satisfying the DD assumption. We give a full characterization of allocations that are necessarily-proportional or possibly-proportional according to this assumption. Based on this characterization, we present a polynomial-time algorithm for finding a necessarily-DD-proportional allocation whenever it exists. Using simulations, we compare the various fairness criteria in terms of their probability of existence, and their probability of being fair by the underlying cardinal valuations. We find that necessary-DD-proportionality fares well in both measures. We also consider envy-freeness and Pareto optimality under diminishing-differences, as well as chore allocation under the analogous condition --- increasing-differences.


Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

arXiv.org Artificial Intelligence

Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large amount of interaction with the environment, which can be prohibitively expensive in realistic scenarios. To address this problem, transfer learning has been applied to reinforcement learning such that experience gained in one task can be leveraged when starting to learn the next, harder task. More recently, several lines of research have explored how tasks, or data samples themselves, can be sequenced into a curriculum for the purpose of learning a problem that may otherwise be too difficult to learn from scratch. In this article, we present a framework for curriculum learning (CL) in reinforcement learning, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals. Finally, we use our framework to find open problems and suggest directions for future RL curriculum learning research.


JS-son -- A Lean, Extensible JavaScript Agent Programming Library

arXiv.org Artificial Intelligence

A multitude of agent-oriented software engineering frameworks exist, most of which are developed by the academic multi-agent systems community. However, these frameworks often impose programming paradigms on their users that are challenging to learn for engineers who are used to modern high-level programming languages such as JavaScript and Python. To show how the adoption of agent-oriented programming by the software engineering mainstream can be facilitated, we provide a lean JavaScript library prototype for implementing reasoning-loop agents. The library focuses on core agent programming concepts and refrains from imposing further restrictions on the programming approach. To illustrate its usefulness, we show how the library can be applied to multi-agent systems simulations on the web, deployed to cloud-hosted function-as-a-service environments, and embedded in Python-based data science tools.


PLOP: Probabilistic poLynomial Objects trajectory Planning for autonomous driving

arXiv.org Artificial Intelligence

To navigate safely in an urban environment, an autonomous vehicle (ego vehicle) needs to understand and anticipate its surroundings, in particular the behavior of other road users (neighbors). However, multiple choices are often acceptable (e.g. turn right or left, or different ways of avoiding an obstacle). We focus here on predicting multiple feasible future trajectories both for the ego vehicle and neighbors through a probabilistic framework. We use a conditional imitation learning algorithm, conditioned by a navigation command for the ego vehicle (e.g. "turn right"). It takes as input the ego car front camera image, a Lidar point cloud in a bird-eye view grid and present and past objects detections to output ego vehicle and neighbors possible trajectories but also semantic segmentation as an auxiliary loss. We evaluate our method on the publicly available dataset nuScenes, showing state-of-the-art performance and investigating the impact of our architecture choices.


"Other-Play" for Zero-Shot Coordination

arXiv.org Artificial Intelligence

We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents. In preliminary results we also show that our OP agents obtains higher average scores when paired with human players, compared to state-of-the-art SP agents.


On the Robustness of Cooperative Multi-Agent Reinforcement Learning

arXiv.org Machine Learning

In cooperative multi-agent reinforcement learning (c-MARL), agents learn to cooperatively take actions as a team to maximize a total team reward. We analyze the robustness of c-MARL to adversaries capable of attacking one of the agents on a team. Through the ability to manipulate this agent's observations, the adversary seeks to decrease the total team reward. Attacking c-MARL is challenging for three reasons: first, it is difficult to estimate team rewards or how they are impacted by an agent mispredicting; second, models are non-differentiable; and third, the feature space is low-dimensional. Thus, we introduce a novel attack. The attacker first trains a policy network with reinforcement learning to find a wrong action it should encourage the victim agent to take. Then, the adversary uses targeted adversarial examples to force the victim to take this action. Our results on the StartCraft II multi-agent benchmark demonstrate that c-MARL teams are highly vulnerable to perturbations applied to one of their agent's observations. By attacking a single agent, our attack method has highly negative impact on the overall team reward, reducing it from 20 to 9.4. This results in the team's winning rate to go down from 98.9% to 0%.


FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis

arXiv.org Machine Learning

Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorithmic contributions to both challenges. First, to generate a realistic, diverse set of opponents, we develop a novel method for self-play based on replica-exchange Markov chain Monte Carlo. Second, we propose a distributionally robust bandit optimization procedure that adaptively adjusts risk aversion relative to uncertainty in beliefs about opponents' behaviors. We rigorously quantify the tradeoffs in performance and robustness when approximating these computations in real-time motion-planning, and we demonstrate our methods experimentally on autonomous vehicles that achieve scaled speeds comparable to Formula One racecars.


A Comparative Study on Parameter Estimation in Software Reliability Modeling using Swarm Intelligence

arXiv.org Artificial Intelligence

This work focuses on a comparison between the performances of two well-known Swarm algorithms: Cuckoo Search (CS) and Firefly Algorithm (FA), in estimating the parameters of Software Reliability Growth Models. This study is further reinforced using Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO). All algorithms are evaluated according to real software failure data, the tests are performed and the obtained results are compared to show the performance of each of the used algorithms. Furthermore, CS and FA are also compared with each other on bases of execution time and iteration number. Experimental results show that CS is more efficient in estimating the parameters of SRGMs, and it has outperformed FA in addition to PSO and ACO for the selected Data sets and employed models.


A Crash Course in Game Theory for Machine Learning: Classic and New Ideas

#artificialintelligence

Game theory is one of the most fascinating areas of mathematics that have influenced diverse fields such as economics, social sciences, biology and, obviously, computer science. Games are playing a key role in the evolution of artificial intelligence(AI). For starters, game environments are becoming a popular training mechanism in areas such as reinforcement learning or imitation learning. In theory, any multi-agent AI system can be subjected to gamified interactions between its participants. The branch of mathematics that formulates the principles of games is known as game theory.


Generating Emotionally Aligned Responses in Dialogues using Affect Control Theory

arXiv.org Artificial Intelligence

State-of-the-art neural dialogue systems excel at syntactic and semantic modelling of language, but often have a hard time establishing emotional alignment with the human interactant during a conversation. In this work, we bring Affect Control Theory (ACT), a socio-mathematical model of emotions for human-human interactions, to the neural dialogue generation setting. ACT makes predictions about how humans respond to emotional stimuli in social situations. Due to this property, ACT and its derivative probabilistic models have been successfully deployed in several applications of Human-Computer Interaction, including empathetic tutoring systems, assistive healthcare devices and two-person social dilemma games. We investigate how ACT can be used to develop affect-aware conversational agents, which produce emotionally aligned responses to prompts and take into consideration the affective identities of the interactants.