Goto

Collaborating Authors

 Agents


Memory Management in Resource-Bounded Agents

arXiv.org Artificial Intelligence

Memory in an agent system is a process of reasoning: it is the l earning process of strengthening a concept. The interaction between an agent and the environment can pla y an important role in constructing its memory and may affect its future behaviour. In fact, through memory an agent is potentially able to recall and to learn from experiences so that its beliefs and i ts future course of action are grounded in these experiences. In computational logic, [2] introduces DLEK (Dynamic Logic of Explicit beliefs and Knowledge) as a logical formalization of the short-term and long-term memory. The underlying idea is to represent reasoning about the formation of beliefs throu gh perception and inference in non-omniscient resource-bounded agents. DLEK has however no notion of time, while agents' actual perceptions are inherently timed and so are many of the inferences drawn from such perceptions. In this paper we present an extension of LEK/DLEK to T-LEK/T-DLEK ("Timed LE K" and "Timed DLEK") obtained by introducing a special function which associates to each b elief the arrival time and controls timed inferences. Through this function it is easier to keep the ev olution of the surrounding world under control and the representation is more complete. This abstr act is an evolution version of [3], where we have introduced explicit time instants and time intervals i n formulas, and it is extracted from [4].


Agent Prioritization for Autonomous Navigation

arXiv.org Artificial Intelligence

In autonomous navigation, a planning system reasons about other agents to plan a safe and plausible trajectory. Before planning starts, agents are typically processed with computationally intensive models for recognition, tracking, motion estimation and prediction. With limited computational resources and a large number of agents to process in real time, it becomes important to efficiently rank agents according to their impact on the decision making process. This allows spending more time processing the most important agents. We propose a system to rank agents around an autonomous vehicle (AV) in real time. We automatically generate a ranking data set by running the planner in simulation on real-world logged data, where we can afford to run more accurate and expensive models on all the agents. The causes of various planner actions are logged and used for assigning ground truth importance scores. The generated data set can be used to learn ranking models. In particular, we show the utility of combining learned features, via a convolutional neural network, with engineered features designed to capture domain knowledge. We show the benefits of various design choices experimentally. When tested on real AVs, our system demonstrates the capability of understanding complex driving situations.


Multi-Robot Deep Reinforcement Learning with Macro-Actions

arXiv.org Artificial Intelligence

A. MacDec-POMDPs Decentralized fully collaborative multi-agent decision-making under uncertainty can be modeled as a decentralized POMDP (Dec-POMDP) [14]. Due to the assumption of synchronous actions that require the same amount of time for each agent, Dec-POMDPs are not applicable to multi-robot planning and learning scenarios in real-world. MacDec-POMDPs, formalized by introducing macro-actions into Dec-POMDPs, inherently allow asynchronous execution among robots with temporally extended macro-actions that can begin and end at different times for each agent. Formally, a MacDec-POMDP is defined as a tuple nullI,S,A, Ω,M,ζ,O,T,Z,R null, where I is a finite set of agents; S is a finite set of environment states; A iA i and Ω iΩ i are the spaces of joint-primitive-action and joint-primitive-observation respectively; M iM i is the joint set of each agent's finite macro-action space M i; ζ iζ i is the set of joint macro-observations over agents' finite macro-observation space ζ i. Given a macro-action- based policy, each agent i is allowed to asynchronously choose a macro-action m i nullβ m,I m,π m null i that depends on individual macro-action-observation histories, where β m: H A i [0, 1] is the stochastic termination condition and I m H M i is the initiation set of the corresponding macro-action m i, respectively depending on the primitive-action- observation history space H A i and macro-action-observation history space H M i of agent i; π m: H A i A i denotes the low-level policy to achieve the macro-action m, and during the execution, each agent's primitive-observation o i Ω i is generated according to probability observation function O i(o i,a i,s) Pr( o i a i,s), and a shared immediate reward r ( s,null a), where null a A iA i, is issued according to the reward function R: S A R .


Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning in Asymmetric Imperfect-Information Games

arXiv.org Artificial Intelligence

This paper presents an algorithmic framework for learning robust policies in asymmetric imperfect-information games, where the joint reward could depend on the uncertain opponent type (a private information known only to the opponent itself and its ally). In order to maximize the reward, the protagonist agent has to infer the opponent type through agent modeling. We use multiagent reinforcement learning (MARL) to learn opponent models through self-play, which captures the full strategy interaction and reasoning between agents. However, agent policies learned from self-play can suffer from mutual overfitting. Ensemble training methods can be used to improve the robustness of agent policy against different opponents, but it also significantly increases the computational overhead. In order to achieve a good trade-off between the robustness of the learned policy and the computation complexity, we propose to train a separate opponent policy against the protagonist agent for evaluation purposes. The reward achieved by this opponent is a noisy measure of the robustness of the protagonist agent policy due to the intrinsic stochastic nature of a reinforcement learner. To handle this stochasticity, we apply a stochastic optimization scheme to dynamically update the opponent ensemble to optimize an objective function that strikes a balance between robustness and computation complexity. We empirically show that, under the same limited computational budget, the proposed method results in more robust policy learning than standard ensemble training.


Segregation Dynamics with Reinforcement Learning and Agent Based Modeling

arXiv.org Artificial Intelligence

Societies are complex. Properties of social systems can be explained by the interplay and weaving of individual actions. Incentives are key to understand people's choices and decisions. For instance, individual preferences of where to live may lead to the emergence of social segregation. In this paper, we combine Reinforcement Learning (RL) with Agent Based Models (ABM) in order to address the self-organizing dynamics of social segregation and explore the space of possibilities that emerge from considering different types of incentives. Our model promotes the creation of interdependencies and interactions among multiple agents of two different kinds that want to segregate from each other. For this purpose, agents use Deep Q-Networks to make decisions based on the rules of the Schelling Segregation model and the Predator-Prey model. Despite the segregation incentive, our experiments show that spatial integration can be achieved by establishing interdependencies among agents of different kinds. They also reveal that segregated areas are more probable to host older people than diverse areas, which attract younger ones. Through this work, we show that the combination of RL and ABMs can create an artificial environment for policy makers to observe potential and existing behaviors associated to incentives.


Design of a Solver for Multi-Agent Epistemic Planning

arXiv.org Artificial Intelligence

The proliferation of agent-based and IoT technologies has e nabled the development of novel applications involving hundreds of agents. Considering that self-drivi ng cars and other autonomous devices that can control several aspects of our daily life are going to be avai lable en mass in just a few years it will not be long until massive systems of autonomous agents, each act ing upon its own knowledge and beliefs to achieve its own (or group) goals, become available and widel y deployed. To maximize the potentials of such autonomous systems, multi-agent planning and scheduling research [1, 8-10, 24, 28] will need to keep pace. Moreover crea ting a plan for multiple agents to achieve a goal will need to take into consideration agents' knowledge and beliefs, to account for aspects like trust, dishonesty, deception, and incomplete knowledge. The plan ning problem in this new setting is referred to as epistemic planning in the literature; that is epistemic planners are not only in terested in the state of the world but also in the knowledge or beliefs of the agents. Nevertheless, reasoning about knowledge and beliefs is not as direct as reasoning on the "physical" state of the world. That is because expressing, for example, belief relations between a group of agents often implies to consider nested and group beliefs that are not easily extracted from the state descrip tion by a human reader. For this reasons it is necessary to develop a complete and accessible action language to model multi-agent epistemic domains [2] and to advance al so in the study of epistemic solvers [4, 19, 23, 26, 34].


A Temporal Module for Logical Frameworks

arXiv.org Artificial Intelligence

In the literature there different kind of timed logical fram eworks exist, where time is specified directly using hybrid logics (cf., e.g., [2]), temporal epistemic lo gic (cf., e.g., [4]) or simply by using Linear Temporal Logic. We propose a temporal module which can be ado pted to "temporalize" many logical framework. This module is in practice a particular kind of fu nction that assigns a "timing" to atoms. We have exploited this T function in two different settings. The first one is the formalization of the reasoning on the formation of beliefs and the interaction wi th background knowledge in non-omniscient agents' memory.


Towards Ethical Machines Via Logic Programming

arXiv.org Artificial Intelligence

However the overall aim is not only important for equipping machines with capabilities of moral reasoning, but also for helping us to better understand morality through creating and testing computational models of ethical machines that follow a set of ideal ethical principles. Since the beginning of this century there were several attempts for implementing ethical decision making into intelligent autonomous agents using different approaches. But, no fully descriptive and widely accepted model of moral judgment and decision-making exists. In this work we propose a hybrid logic-based approach for modeling ethical machines, particularly ethical chatbots. As a matter of fact the potential of logic programming (LP) to model moral machines was envisioned by Pereira and Saptawijaya [15].


The Animal-AI Environment: Training and Testing Animal-Like Artificial Cognition

arXiv.org Artificial Intelligence

Recent advances in artificial intelligence have been strongly driven by the use of game environments for training and evaluating agents. Games are often accessible and versatile, with well-defined state-transitions and goals allowing for intensive training and experimentation. However, agents trained in a particular environment are usually tested on the same or slightly varied distributions, and solutions do not necessarily imply any understanding. If we want AI systems that can model and understand their environment, we need environments that explicitly test for this. Inspired by the extensive literature on animal cognition, we present an environment that keeps all the positive elements of standard gaming environments, but is explicitly designed for the testing of animal-like artificial cognition. All source-code is publicly available (see appendix).


Multi-Agent Hide and Seek

#artificialintelligence

We've observed agents discovering progressively more complex tool use while playing a simple game of hide-and-seek. Through training in our new simulated hide-and-seek environment, agents build a series of six distinct strategies and counterstrategies, some of which we did not know our environment supported. The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior.