Goto

Collaborating Authors

 Agents


Neufeld

AAAI Conferences

Intelligent autonomous agents that are acting in dynamic environmentsin real-time are often required to follow long-termstrategies while also remaining reactive and being able to actdeliberately. In order to create intelligent behaviors for videogame characters, there are two common approaches – plannersare used for long-term strategical planning, whereas BehaviorTrees allow for reactive acting. Although both methodologieshave their advantages, when used on their own, theyfail to fully achieve both requirements described above. Inthis work, we propose a hybrid approach combining a HierarchicalTask Network planner for high-level planning whiledelegating low-level decision making and acting to BehaviorTrees. Furthermore, we compare this approach with a pureplanner in a multi-agent environment.


De Mesentier Silva

AAAI Conferences

The process of play testing a game is subjective, expensive and incomplete. In this paper, we present a play-testing approach that explores the game space with automated agents and collects data to answer questions posed by the designers. Rather than have agents interacting with an actual game client, this approach recreates the bare bone mechanics of the game as a separate system. Our agent is able to play in minutes what would take testers days of organic gameplay. The analysis of thousands of game simulations exposed imbalances in game actions, identified inconsequential rewards and evaluated the effectiveness of optional strategic choices. Our test case game, The Sims Mobile, was recently released and the findings shown here influenced design changes that resulted in improved player experience.


Eger

AAAI Conferences

Social deduction games present a unique challenge for AI agents, because communication plays a central role in most of them, and deception plays a key role in game play. To be successful in such games, players need to come up with convincing stories, but also discern the truth of statements of other players and adapt to the information learned from them. In this paper we present an approach for virtual agents that have to determine how long to stick to their story in the light of information obtained from other players. We apply this approach to a particular social deduction game, One Night Ultimate Werewolf, and demonstrate the effect of different levels of commitment to an agent's story.


Reward-Respecting Subtasks for Model-Based Reinforcement Learning

arXiv.org Artificial Intelligence

To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time. Deep learning has made progress in state abstraction, but, although the theory of time abstraction has been extensively developed based on the options framework, in practice options have rarely been used in planning. One reason for this is that the space of possible options is immense and the methods previously proposed for option discovery do not take into account how the option models will be used in planning. Options are typically discovered by posing subsidiary tasks such as reaching a bottleneck state, or maximizing a sensory signal other than the reward. Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process. The subtasks proposed in most previous work ignore the reward on the original problem, whereas we propose subtasks that use the original reward plus a bonus based on a feature of the state at the time the option stops. We show that options and option models obtained from such reward-respecting subtasks are much more likely to be useful in planning and can be learned online and off-policy using existing learning algorithms. Reward respecting subtasks strongly constrain the space of options and thereby also provide a partial solution to the problem of option discovery. Finally, we show how the algorithms for learning values, policies, options, and models can be unified using general value functions.


Reward is not enough: can we liberate AI from the reinforcement learning paradigm?

arXiv.org Artificial Intelligence

I present arguments against the hypothesis put forward by Silver, Singh, Precup, and Sutton ( https://www.sciencedirect.com/science/article/pii/S0004370221000862 ) : reward maximization is not enough to explain many activities associated with natural and artificial intelligence including knowledge, learning, perception, social intelligence, evolution, language, generalisation and imitation. I show such reductio ad lucrum has its intellectual origins in the political economy of Homo economicus and substantially overlaps with the radical version of behaviourism. I show why the reinforcement learning paradigm, despite its demonstrable usefulness in some practical application, is an incomplete framework for intelligence -- natural and artificial. Complexities of intelligent behaviour are not simply second-order complications on top of reward maximisation. This fact has profound implications for the development of practically usable, smart, safe and robust artificially intelligent agents.


How values-driven artificial intelligence can reshape the way we communicate

#artificialintelligence

Mike Ananny walked his dog this morning. He did so with no expectation of privacy. "I know that I was subject to a wide variety of cameras, whether it's Ring doorbells, cars driving along, or even city traffic cameras," he said. "I didn't choose to participate in this whole variety of video surveillance systems. I just took my dog for a walk." Ananny understands that, wherever he goes, data about him is being collected, analyzed and monetized by artificial intelligence (AI). Kate Crawford drove a van deep into the arid Nevada landscape to get a good look at the evaporating brine ponds of the Silver Peak Lithium Mine.


Boolean Observation Games

arXiv.org Artificial Intelligence

We introduce Boolean Observation Games, a subclass of multi-player finite strategic games with incomplete information and qualitative objectives. In Boolean observation games, each player is associated with a finite set of propositional variables of which only it can observe the value, and it controls whether and to whom it can reveal that value. It does not control the given, fixed, value of variables. Boolean observation games are a generalization of Boolean games, a well-studied subclass of strategic games but with complete information, and wherein each player controls the value of its variables. In Boolean observation games player goals describe multi-agent knowledge of variables. As in classical strategic games, players choose their strategies simultaneously and therefore observation games capture aspects of both imperfect and incomplete information. They require reasoning about sets of outcomes given sets of indistinguishable valuations of variables. What a Nash equilibrium is, depends on an outcome relation between such sets. We present various outcome relations, including a qualitative variant of ex-post equilibrium. We identify conditions under which, given an outcome relation, Nash equilibria are guaranteed to exist. We also study the complexity of checking for the existence of Nash equilibria and of verifying if a strategy profile is a Nash equilibrium. We further study the subclass of Boolean observation games with `knowing whether' goal formulas, for which the satisfaction does not depend on the value of variables. We show that each such Boolean observation game corresponds to a Boolean game and vice versa, by a different correspondence, and that both correspondences are precise in terms of existence of Nash equilibria.


GrASP: Gradient-Based Affordance Selection for Planning

arXiv.org Artificial Intelligence

Planning with a learned model is arguably a key component of intelligence. There are several challenges in realizing such a component in large-scale reinforcement learning (RL) problems. One such challenge is dealing effectively with continuous action spaces when using tree-search planning (e.g., it is not feasible to consider every action even at just the root node of the tree). In this paper we present a method for selecting affordances useful for planning -- for learning which small number of actions/options from a continuous space of actions/options to consider in the tree-expansion process during planning. We consider affordances that are goal-and-state-conditional mappings to actions/options as well as unconditional affordances that simply select actions/options available in all states. Our selection method is gradient based: we compute gradients through the planning procedure to update the parameters of the function that represents affordances. Our empirical work shows that it is feasible to learn to select both primitive-action and option affordances, and that simultaneously learning to select affordances and planning with a learned value-equivalent model can outperform model-free RL.


Backdoor Detection in Reinforcement Learning

arXiv.org Artificial Intelligence

While the real world application of reinforcement learning (RL) is becoming popular, the safety concern and the robustness of an RL system require more attention. A recent work reveals that, in a multi-agent RL environment, backdoor trigger actions can be injected into a victim agent (a.k.a. trojan agent), which can result in a catastrophic failure as soon as it sees the backdoor trigger action. We propose the problem of RL Backdoor Detection, aiming to address this safety vulnerability. An interesting observation we drew from extensive empirical studies is a trigger smoothness property where normal actions similar to the backdoor trigger actions can also trigger low performance of the trojan agent. Inspired by this observation, we propose a reinforcement learning solution TrojanSeeker to find approximate trigger actions for the trojan agents, and further propose an efficient approach to mitigate the trojan agents based on machine unlearning. Experiments show that our approach can correctly distinguish and mitigate all the trojan agents across various types of agents and environments.


Evaluating Robustness of Cooperative MARL: A Model-based Approach

arXiv.org Artificial Intelligence

In recent years, a proliferation of methods were developed for cooperative multi-agent reinforcement learning (c-MARL). However, the robustness of c-MARL agents against adversarial attacks has been rarely explored. In this paper, we propose to evaluate the robustness of c-MARL agents via a model-based approach. Our proposed formulation can craft stronger adversarial state perturbations of c-MARL agents(s) to lower total team rewards more than existing model-free approaches. In addition, we propose the first victim-agent selection strategy which allows us to develop even stronger adversarial attack. Numerical experiments on multi-agent MuJoCo benchmarks illustrate the advantage of our approach over other baselines. The proposed model-based attack consistently outperforms other baselines in all tested environments.