Agents
Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning
Semnani, Samaneh Hosseini, Liu, Hugh, Everett, Michael, de Ruiter, Anton, How, Jonathan P.
How Abstract --This paper introduces a hybrid algorithm of deep reinforcement learning (RL) and Force-based motion planning (FMP) to solve distributed motion planning problem in dense and dynamic environments. Individually, RL and FMP algorithms each have their own limitations. FMP is not able to produce time-optimal paths and existing RL solutions are not able to produce collision-free paths in dense environments. Therefore, we first tried improving the performance of recent RL approaches by introducing a new reward function that not only eliminates the requirement of a pre supervised learning (SL) step but also decreases the chance of collision in crowded environments. That improved things, but there were still a lot of failure cases. So, we developed a hybrid approach to leverage the simpler FMP approach in stuck, simple and high-risk cases, and continue using RL for normal cases in which FMP can't produce optimal path. Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show that the proposed algorithm outperforms both deep RL and FMP algorithms and produces up to 50 % more successful scenarios than deep RL and up to 75 % less extra time to reach goal than FMP . Index T erms --Motion planning, distributed algorithms, collision avoidance, deep learning, reinforcement learning, trajectory optimization, hybrid control. I NTRODUCTION M UL TI-AGENT motion planning has recently attracted much interest in the research community and has many applications including robot navigation among pedestrians, self-driving cars, and drone shows.
Smooth markets: A basic mechanism for organizing gradient-based learners
Balduzzi, David, Czarnecki, Wojciech M, Anthony, Thomas W, Gemp, Ian M, Hughes, Edward, Leibo, Joel Z, Piliouras, Georgios, Graepel, Thore
With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codify a common design pattern in machine learning that includes (some) GANs, adversarial training, and other recent algorithms. We show that SM-games are amenable to analysis and optimization using first-order methods.
A utility-based analysis of equilibria in multi-objective normal form games
Rădulescu, Roxana, Mannion, Patrick, Zhang, Yijie, Roijers, Diederik M., Nowé, Ann
Example application domains include urban and air traffic control (Mannion et al., 2016a; Yliniemi et al., 2015), autonomous vehicles (R adulescu et al., 2018; Talpert et al., 2019) and energy systems (Walraven and Spaan, 2016; Mannion et al., 2016b; Reymond et al., 2018). Although many such problems feature multiple conflicting objectives to optimise, most MAS research focuses on agents maximising their return w.r.t. a single objective. By contrast, in multi-objective multi-agent systems (MOMAS), agents explicitly consider the possible tradeoffs between conflicting objective functions. Agents in a MOMAS receive vector-valued payoffs for their actions, where each component of a payoff vector represents the performance on a different objective. Following the utility-based approach (Roijers et al., 2013), we assume that each agent has a utility function which maps vector-valued payoffs to scalar utility values. Compromises between competing objectives are then considered on the the basis of the utility that these tradeoffs have for the users of a MOMAS. The utility-based approach naturally leads to two different optimisation criteria for agents in a MOMAS: expected scalarised returns (ESR) and scalarised expected returns (SER). To date, the differences between the SER and ESR approaches have received little attention in multi-agent settings, despite having received some attention in single-agent settings (see e.g.
Plato Dialogue System: A Flexible Conversational AI Research Platform
Papangelis, Alexandros, Namazifar, Mahdi, Khatri, Chandra, Wang, Yi-Chia, Molino, Piero, Tur, Gokhan
As the field of Spoken Dialogue Systems and Conversational AI grows, so does the need for tools and environments that abstract away implementation details in order to expedite the development process, lower the barrier of entry to the field, and offer a common test-bed for new ideas. In this paper, we present Plato, a flexible Conversational AI platform written in Python that supports any kind of conversational agent architecture, from standard architectures to architectures with jointly-trained components, single- or multi-party interactions, and offline or online training of any conversational agent component. Plato has been designed to be easy to understand and debug and is agnostic to the underlying learning frameworks that train each component.
Visual Simplified Characters' Emotion Emulator Implementing OCC Model
Laureano-Cruces, Ana Lilia, Hernández-Domínguez, Laura, Mora-Torres, Martha, Torres-Moreno, Juan-Manuel, Cabrera-López, Jaime Enrique
In this paper, we present a visual emulator of the emotions seen in characters in stories. This system is based on a simplified view of the cognitive structure of emotions proposed by Ortony, Clore and Collins (OCC Model). The goal of this paper is to provide a visual platform that allows us to observe changes in the characters' different emotions, and the intricate interrelationships between: 1) each character's emotions, 2) their affective relationships and actions, 3) The events that take place in the development of a plot, and 4) the objects of desire that make up the emotional map of any story. This tool was tested on stories with a contrasting variety of emotional and affective environments: Othello, Twilight, and Harry Potter, behaving sensibly and in keeping with the atmosphere in which the characters were immersed.
Knowledge Integration of Collaborative Product Design Using Cloud Computing Infrastructure
Bohlouli, Mahdi, Holland, Alexander, Fathi, Madjid
-- T he pivotal key for the success of manufacturing enterprises is sustainable and innovative product design and development. In collaborative design, stakehol ders are heterogeneously distributed chain - like . Due to the growing volume of data and knowledge, an effective management of the knowledge acquired in the product design and development is one of the key challenges facing most manufacturing enterprises. Opportunities for improving efficiency and performance of IT - based product design applications through centralization of resources such as knowledge and computation have increased in the last few years with maturation of technologies such as SOA, virtualization, grid computing, and /or cloud computing. The main focus of this paper is the concept of ongoing research in providing the knowledge integration service for collaborative product design and development using cloud computing infra structure . P otential s of the cloud computing to support the Knowledge integration functionalities as a Service by providing functionalities such as knowledge mapping, merging, searching, and transferring in product design procedure are described in this paper . Proposed knowledge integration services support users by giving real - time access to knowledge resources. The framework has the advantage of availability, efficiency, cost reduction, less time to result, and scalability . Changes made during the early design stage do not cause the significant increase in costs, while during the production stage, sharp increase in costs will occur since many blueprints, design documents or components would require re - work and re - design [ 5 ] . Today's research is focused on optimising the development methodologies to enable shorter time, lower costs and higher quality of the systems [ 2 ] . The pivotal key for the success of manufacturing enterprises is sustainable and innovative product design and development . In order to achieve this goal, it is required to have a real and deep knowledge of former and current procedures in the manufacturing enterprise [4] and future needs as well as customer feedback s and various stages of production cha in activities. Realization of an efficient knowledge transfer between different stakeholders of product development process such as linking customers and suppliers proactively throughout the entire value chain, and collaborating across boundaries in distri buted enterprise s is reinforcing this trend.
Optimal by Design: Model-Driven Synthesis of Adaptation Strategies for Autonomous Systems
Elrakaiby, Yehia, Spoletini, Paola, Nuseibeh, Bashar
--Many software systems have become too large and complex to be managed efficiently by human administrators, particularly when they operate in uncertain and dynamic environments and require frequent changes. Requirements-driven adaptation techniques have been proposed to endow systems with the necessary means to autonomously decide ways to satisfy their requirements. However, many current approaches rely on general-purpose languages, models and/or frameworks to design, develop and analyze autonomous systems. Unfortunately, these tools are not tailored towards the characteristics of adaptation problems in autonomous systems. D proposes a model (and a language) for the high-level description of the basic elements of self-adaptive systems, namely the system, capabilities, requirements and environment. Based on those elements, a Markov Decision Process (MDP) is constructed to compute the optimal strategy or the most rewarding system behavior . Furthermore, this defines a reflex controller that can ensure timely responses to changes. One novel feature of the framework is that it benefits both from goal-oriented techniques, developed for requirement elicitation, refinement and analysis, and synthesis capabilities and extensive research around MDPs, their extensions and tools. Our preliminary evaluation results demonstrate the practicality and advantages of the framework. Autonomous systems such as unmanned vehicles and robots play an increasingly relevant role in our societies. Many factors contribute to the complexity in the design and development of those systems. First, they typically operate in dynamic and uncontrollable environments [1]-[5]. Therefore, they must continuously adapt their configuration in response to changes, both in their operating environment and in themselves. Since the frequency of change cannot be controlled, decision-making must be almost instantaneous to ensure timely responses. From a design and management perspective, it is desirable to minimize the effort needed to design the system and to enable its runtime updating and maintenance. A promising technique to address those challenges is requirements-driven adaptation that endow systems with the necessary means to autonomously operate based on their requirements. Requirements are prescriptive statements of intent to be satisfied by cooperation of the agents forming the system [6]. They say what the system will do and not how it will do it [7].
Adversarially Guided Self-Play for Adopting Social Conventions
Tucker, Mycal, Zhou, Yilun, Shah, Julie
Robotic agents must adopt existing social conventions in order to be effective teammates. These social conventions, such as driving on the right or left side of the road, are arbitrary choices among optimal policies, but all agents on a successful team must use the same convention. Prior work has identified a method of combining self-play with paired input-output data gathered from existing agents in order to learn their social convention without interacting with them. We build upon this work by introducing a technique called Adversarial Self-Play (ASP) that uses adversarial training to shape the space of possible learned policies and substantially improves learning efficiency. ASP only requires the addition of unpaired data: a dataset of outputs produced by the social convention without associated inputs. Theoretical analysis reveals how ASP shapes the policy space and the circumstances (when behaviors are clustered or exhibit some other structure) under which it offers the greatest benefits. Empirical results across three domains confirm ASP's advantages: it produces models that more closely match the desired social convention when given as few as two paired datapoints.
Broadening Label-based Argumentation Semantics with May-Must Scales
The semantics as to which set of arguments in a given argumentation graph may be acceptable (acceptability semantics) can be characterised in a few different ways. Among them, labelling-based approach allows for concise and flexible determination of acceptability statuses of arguments through assignment of a label indicating acceptance, rejection, or undecided to each argument. In this work, we contemplate a way of broadening it by accommodating may- and must- conditions for an argument to be accepted or rejected, as determined by the number(s) of rejected and accepted attacking arguments. We show that the broadened label-based semantics can be used to express more mild indeterminacy than inconsistency for acceptability judgement when, for example, it may be the case that an argument is accepted and when it may also be the case that it is rejected. We identify that finding which conditions a labelling satisfies for every argument can be an undecidable problem, which has an unfavourable implication to semantics. We propose to address this problem by enforcing a labelling to maximally respect the conditions, while keeping the rest that would necessarily cause non-termination labelled undecided.
The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits
Chawla, Ronshee, Sankararaman, Abishek, Ganesh, Ayalvadi, Shakkottai, Sanjay
We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through pairwise gossip style communications. We develop two novel algorithms, where each agent only plays from a subset of all the arms. Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play. We establish that, if agents communicate $\Omega(\log(T))$ times through any connected pairwise gossip mechanism, then every agent's regret is a factor of order $N$ smaller compared to the case of no collaborations. Furthermore, we show that the communication constraints only have a second order effect on the regret of our algorithm. We then analyze this second order term of the regret to derive bounds on the regret-communication tradeoffs. Finally, we empirically evaluate our algorithm and conclude that the insights are fundamental and not artifacts of our bounds. We also show a lower bound which gives that the regret scaling obtained by our algorithm cannot be improved even in the absence of any communication constraints. Our results demonstrate that even a minimal level of collaboration among agents greatly reduces regret for all agents.