Goto

Collaborating Authors

 Agents


The Foreseeable Future: Self-Supervised Learning to Predict Dynamic Scenes for Indoor Navigation

arXiv.org Artificial Intelligence

Abstract--We present a method for generating, predicting, and using Spatiotemporal Occupancy Grid Maps (SOGM), which embed future semantic information of real dynamic scenes. We present an auto-labeling process that creates SOGMs from noisy real navigation data. We use a 3D-2D feedforward architecture, trained to predict the future time steps of SOGMs, given 3D lidar frames as input. Our pipeline is entirely self-supervised, thus enabling lifelong learning for real robots. The network is composed of a 3D back-end that extracts rich features and enables the semantic segmentation of the lidar frames, and a 2D front-end that predicts the future information embedded in the SOGM representation, potentially capturing the complexities and uncertainties of real-world multi-agent, multi-future interactions. We also design a navigation system that uses these predicted SOGMs within planning, after they have been transformed into Spatiotemporal Risk Maps (SRMs). We verify our navigation system's abilities in simulation, validate it on a real robot, study SOGM predictions on real data in various circumstances, and Time is represented as a color, from red (now) to yellow (future). REDICTING the future has always fascinated humanity. In this paper, we provide a detailed curiosity for the unknown has never faded. But we tend to description of the collection of algorithms required for these forget that we already predict the future constantly in our daily various tasks, for a complete view of the overall approach, as lives, only it is for a short horizon. Walking in the street, illustrated in Figure 2. catching a falling object, or driving a car, all these actions Some of the algorithms we use have already been introduced require a certain level of anticipation. In the first one [1], we described can become quite good at predicting what might happen for how to automatically annotate 3D lidar points, and train a the next few seconds in many situations; what about robots? In the second one We study this question in the context of a concrete example: [2], our system learned to predict the future of dynamic a robot learning on its own to navigate among humans or scenes as SOGMs. Until now, we only evaluated results in dynamic objects in an indoor space. Our approach allows the a simulated environment.


Controlling Robot Swarm Aggregation through a Minority of Informed Robots

arXiv.org Artificial Intelligence

Self-organized aggregation is a well studied behavior in swarm robotics as it is the pre-condition for the development of more advanced group-level responses. In this paper, we investigate the design of decentralized algorithms for a swarm of heterogeneous robots that self-aggregate over distinct target sites. A previous study has shown that including as part of the swarm a number of informed robots can steer the dynamic of the aggregation process to a desirable distribution of the swarm between the available aggregation sites. We have replicated the results of the previous study using a simplified approach: we removed constraints related to the communication protocol of the robots and simplified the control mechanisms regulating the transitions between states of the probabilistic controller. The results show that the performances obtained with the previous, more complex, controller can be replicated with our simplified approach which offers clear advantages in terms of portability to the physical robots and in terms of flexibility. That is, our simplified approach can generate self-organized aggregation responses in a larger set of operating conditions than what can be achieved with the complex controller.


Multi-agent reinforcement learning for intent-based service assurance in cellular networks

arXiv.org Artificial Intelligence

Recently, intent-based management has received good attention in telecom networks owing to stringent performance requirements for many of the use cases. Several approaches in the literature employ traditional closed-loop driven methods to fulfill the intents on the KPIs. However, these methods consider every closed-loop independent of each other which degrades the combined performance. Also, such existing methods are not easily scalable. Multi-agent reinforcement learning (MARL) techniques have shown significant promise in many areas in which traditional closed-loop control falls short, typically for complex coordination and conflict management among loops. In this work, we propose a method based on MARL to achieve intent-based management without the need for knowing a model of the underlying system. Moreover, when there are conflicting intents, the MARL agents can implicitly incentivize the loops to cooperate and promote trade-offs, without human interaction, by prioritizing the important KPIs. Experiments have been performed on a network emulator for optimizing KPIs of three services. Results obtained demonstrate that the proposed system performs quite well and is able to fulfill all existing intents when there are enough resources or prioritize the KPIs when resources are scarce.


Play with Emotion: Affect-Driven Reinforcement Learning

arXiv.org Artificial Intelligence

This paper introduces a paradigm shift by viewing the task of affect modeling as a reinforcement learning (RL) process. According to the proposed paradigm, RL agents learn a policy (i.e. affective interaction) by attempting to maximize a set of rewards (i.e. behavioral and affective patterns) via their experience with their environment (i.e. context). Our hypothesis is that RL is an effective paradigm for interweaving affect elicitation and manifestation with behavioral and affective demonstrations. Importantly, our second hypothesis-building on Damasio's somatic marker hypothesis-is that emotion can be the facilitator of decision-making. We test our hypotheses in a racing game by training Go-Blend agents to model human demonstrations of arousal and behavior; Go-Blend is a modified version of the Go-Explore algorithm which has recently showcased supreme performance in hard exploration tasks. We first vary the arousal-based reward function and observe agents that can effectively display a palette of affect and behavioral patterns according to the specified reward. Then we use arousal-based state selection mechanisms in order to bias the strategies that Go-Blend explores. Our findings suggest that Go-Blend not only is an efficient affect modeling paradigm but, more importantly, affect-driven RL improves exploration and yields higher performing agents, validating Damasio's hypothesis in the domain of games.


BITS: Bi-level Imitation for Traffic Simulation

arXiv.org Artificial Intelligence

Simulation is the key to scaling up validation and verification for robotic systems such as autonomous vehicles. Despite advances in high-fidelity physics and sensor simulation, a critical gap remains in simulating realistic behaviors of road users. This is because, unlike simulating physics and graphics, devising first principle models for human-like behaviors is generally infeasible. In this work, we take a data-driven approach and propose a method that can learn to generate traffic behaviors from real-world driving logs. The method achieves high sample efficiency and behavior diversity by exploiting the bi-level hierarchy of driving behaviors by decoupling the traffic simulation problem into high-level intent inference and low-level driving behavior imitation. The method also incorporates a planning module to obtain stable long-horizon behaviors. We empirically validate our method, named Bi-level Imitation for Traffic Simulation (BITS), with scenarios from two large-scale driving datasets and show that BITS achieves balanced traffic simulation performance in realism, diversity, and long-horizon stability. We also explore ways to evaluate behavior realism and introduce a suite of evaluation metrics for traffic simulation. Finally, as part of our core contributions, we develop and open source a software tool that unifies data formats across different driving datasets and converts scenes from existing datasets into interactive simulation environments. For additional information and videos, see https://sites.google.com/view/nvr-bits2022/home


Emergence of group hierarchy

arXiv.org Artificial Intelligence

Indeed in most opinion dynamics models ([12, 13, 7, 4, 14, 11], for a recent review see: [9]), apart from a few exceptions [1, 3, 10], opinions about the agents themselves are not considered as deserving any specific attention. However, the opinions about agents determine the social network of positive or negative connections, hence in some respect the social structure. Moreover, it is generally recognised that this social structure has a strong influence on the agents' opinions. This suggests that opinions about agents do matter. Several opinion dynamics models include such a structure and in some cases it is evolving. This is for instance the case of some versions of the social impact model [16, 15]. Moreover, other researches propose models of social structure dynamics, for instance hierarchies resulting from fights between primates [2]. In both cases, the social structure is generated by processes that are different from opinion dynamics. In this paper, we assume that the agents of the model proposed in [5, 8] belong to different groups.


Linear Quadratic Mean-Field Games with Communication Constraints

arXiv.org Artificial Intelligence

In this paper, we study a large population game with heterogeneous dynamics and cost functions solving a consensus problem. Moreover, the agents have communication constraints which appear as: (1) an Additive-White Gaussian Noise (AWGN) channel, and (2) asynchronous data transmission via a fixed scheduling policy. Since the complexity of solving the game increases with the number of agents, we use the Mean-Field Game paradigm to solve it. Under standard assumptions on the information structure of the agents, we prove that the control of the agent in the MFG setting is free of the dual effect. This allows us to obtain an equilibrium control policy for the generic agent, which is a function of only the local observation of the agent. Furthermore, the equilibrium mean-field trajectory is shown to follow linear dynamics, hence making it computable. We show that in the finite population game, the equilibrium control policy prescribed by the MFG analysis constitutes an $\epsilon$-Nash equilibrium, where $\epsilon$ tends to zero as the number of agents goes to infinity. The paper is concluded with simulations demonstrating the performance of the equilibrium control policy.


Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing

arXiv.org Artificial Intelligence

We investigate Nash equilibrium learning in a competitive Markov Game (MG) environment, where multiple agents compete, and multiple Nash equilibria can exist. In particular, for an oligopolistic dynamic pricing environment, exact Nash equilibria are difficult to obtain due to the curse-of-dimensionality. We develop a new model-free method to find approximate Nash equilibria. Gradient-free black box optimization is then applied to estimate $\epsilon$, the maximum reward advantage of an agent unilaterally deviating from any joint policy, and to also estimate the $\epsilon$-minimizing policy for any given state. The policy-$\epsilon$ correspondence and the state to $\epsilon$-minimizing policy are represented by neural networks, the latter being the Nash Policy Net. During batch update, we perform Nash Q learning on the system, by adjusting the action probabilities using the Nash Policy Net. We demonstrate that an approximate Nash equilibrium can be learned, particularly in the dynamic pricing domain where exact solutions are often intractable.


Modelling the Recommender Alignment Problem

arXiv.org Artificial Intelligence

Recommender systems (RS) mediate human experience online. Most RS act to optimize metrics that are imperfectly aligned with the best-interest of users but are easy to measure, like ad-clicks and user engagement. This has resulted in a host of hard-to-measure side-effects: political polarization, addiction, fake news. RS design faces a recommender alignment problem: that of aligning recommendations with the goals of users, system designers, and society as a whole. But how do we test and compare potential solutions to align RS? Their massive scale makes them costly and risky to test in deployment. We synthesized a simple abstract modelling framework to guide future work. To illustrate it, we construct a toy experiment where we ask: "How can we evaluate the consequences of using user retention as a reward function?" To answer the question, we learn recommender policies that optimize reward functions by controlling graph dynamics on a toy environment. Based on the effects that trained recommenders have on their environment, we conclude that engagement maximizers generally lead to worse outcomes than aligned recommenders but not always. After learning, we examine competition between RS as a potential solution to RS alignment. We find that it generally makes our toy-society better-off than it would be under the absence of recommendation or engagement maximizers. In this work, we aimed for a broad scope, touching superficially on many different points to shed light on how an end-to-end study of reward functions for recommender systems might be done. Recommender alignment is a pressing and important problem. Attempted solutions are sure to have far-reaching impacts. Here, we take a first step in developing methods to evaluating and comparing solutions with respect to their impacts on society.


Towards A Complete Multi-Agent Pathfinding Algorithm For Large Agents

arXiv.org Artificial Intelligence

Multi-agent pathfinding (MAPF) is a challenging problem which is hard to solve optimally even when simplifying assumptions are adopted, e.g. planar graphs (typically -- grids), discretized time, uniform duration of move and wait actions etc. On the other hand, MAPF under such restrictive assumptions (also known as the Classical MAPF) is equivalent to the so-called pebble motion problem for which non-optimal polynomial time algorithms do exist. Recently, a body of works emerged that investigated MAPF beyond the basic setting and, in particular, considered agents of arbitrary size and shape. Still, to the best of our knowledge no complete algorithms for such MAPF variant exists. In this work we attempt to narrow this gap by considering MAPF for large agents and suggesting how this problem can be reduced to pebble motion on (general) graphs. The crux of this reduction is the procedure that moves away the agents away from the edge which is needed to perform a move action of the current agent. We consider different variants of how this procedure can be implemented and present a variant of the pebble motion algorithm which incorporates this procedure. Unfortunately, the algorithm is still incomplete, but empirically we show that it is able to solve much more MAPF instances (under the strict time limit) with large agents on arbitrary non-planar graphs (roadmaps) compared to the state-of-the-art MAPF solver -- Continous Conflict-Based Search (CCBS).