Country
Multi-Agent Task Allocation in Complementary Teams: A Hunter and Gatherer Approach
Dadvar, Mehdi, Moazami, Saeed, Myler, Harley R., Zargarzadeh, Hassan
Consider a dynamic task allocation problem, where tasks are unknowingly distributed over an environment . This paper considers ea ch task comprised of two sequential subtasks: detection and completion, where e ach subtask can only be carried out by a certain type of agent . We address th is problem using a novel natur e - inspired approach called "hunter and gathere r" . Th e proposed method employs two complementary teams of agents: one agile in detecting (hunters) and another dexterous in completing (gathere r s) the tasks . To minimize the collective cost of task accomplishments in a distributed manner, a game - theor etic solution is introduced to couple agents from complementary teams . We utiliz e market - based negotiation models to develop incentive - based decision - making algorithms rely ing on innovative notions of " certainty and uncertainty profit margins " . The simulation results demonstrate that employing two complementary teams of hunters and gatherers can effectually improve the number of tasks completed by agents compared to conventional methods, while the collec tive cost of accomplishments is minimized . In addition, t he stability and efficacy of the proposed solutions are studied using Nash equilibrium analysis and statistical analysis respectively . It is also numerically show n that the proposed solution s function fairly, i.e. for each type of agent, the overall w orkload is distributed equally . Index Terms -- Distributed multiagent system, dynamic task allocation, game theory, negotiation. Multirobot systems are expected to undertake imperative roles in a wide variety of fields such as urban search and rescue (USAR) [1, 2], agricultural field operations [3], security patrols [4, 5], environmental monitoring [6], and industrial procedures [7] . Studies have shown that multi - robot systems have advantage over single - robot systems by offering more reliability, redundancy, and time efficiency when the nature of the tasks is inherently dist ributed [8] . Nonetheless, the problem of multi - robot task - allocation (MRTA) poses many critical challenges that has called for investigation in the past two decades [9 - 11] . In this regards, t he complexity of MRTA problems increases significantly in a dynamic environment, where the number and location of tasks are unknown for agents [12, 13] . Thus, robot s need to explore the environment to find tasks before accomplishing them.
deepsing: Generating Sentiment-aware Visual Stories using Cross-modal Music Translation
Passalis, Nikolaos, Doropoulos, Stavros
In this paper we propose a deep learning method for performing attributed-based music-to-image translation. The proposed method is applied for synthesizing visual stories according to the sentiment expressed by songs. The generated images aim to induce the same feelings to the viewers, as the original song does, reinforcing the primary aim of music, i.e., communicating feelings. The process of music-to-image translation poses unique challenges, mainly due to the unstable mapping between the different modalities involved in this process. In this paper, we employ a trainable cross-modal translation method to overcome this limitation, leading to the first, to the best of our knowledge, deep learning method for generating sentiment-aware visual stories. Various aspects of the proposed method are extensively evaluated and discussed using different songs.
A Billion Ways to Grasp: An Evaluation of Grasp Sampling Schemes on a Dense, Physics-based Grasp Data Set
Eppner, Clemens, Mousavian, Arsalan, Fox, Dieter
Robot grasping is often formulated as a learning problem. With the increasing speed and quality of physics simulations, generating large-scale grasping data sets that feed learning algorithms is becoming more and more popular. An often overlooked question is how to generate the grasps that make up these data sets. In this paper, we review, classify, and compare different grasp sampling strategies. Our evaluation is based on a fine-grained discretization of SE(3) and uses physics-based simulation to evaluate the quality and robustness of the corresponding parallel-jaw grasps. Specifically, we consider more than 1 billion grasps for each of the 21 objects from the YCB data set. This dense data set lets us evaluate existing sampling schemes w.r.t. their bias and efficiency. Our experiments show that some popular sampling schemes contain significant bias and do not cover all possible ways an object can be grasped.
Learning to Request Guidance in Emergent Communication
Kolb, Benjamin, Lang, Leon, Bartsch, Henning, Gansekoele, Arwin, Koopmanschap, Raymond, Romor, Leonardo, Speck, David, Mul, Mathijs, Bruni, Elia
Previous research into agent communication has shown that a pre-trained guide can speed up the learning process of an imitation learning agent. The guide achieves this by providing the agent with discrete messages in an emerged language about how to solve the task. We extend this one-directional communication by a one-bit communication channel from the learner back to the guide: It is able to ask the guide for help, and we limit the guidance by penalizing the learner for these requests. During training, the agent learns to control this gate based on its current observation. We find that the amount of requested guidance decreases over time and guidance is requested in situations of high uncertainty. We investigate the agent's performance in cases of open and closed gates and discuss potential motives for the observed gating behavior.
SMiRL: Surprise Minimizing RL in Dynamic Environments
Berseth, Glen, Geng, Daniel, Devin, Coline, Finn, Chelsea, Jayaraman, Dinesh, Levine, Sergey
All living organisms struggle against the forces of nature to carve out niches where they can maintain homeostasis. We propose that such a search for order amidst chaos might offer a unifying principle for the emergence of useful behaviors in artificial agents. We formalize this idea into an unsupervised reinforcement learning method called surprise minimizing RL (SMiRL). SMiRL trains an agent with the objective of maximizing the probability of observed states under a model trained on previously seen states. The resulting agents can acquire proactive behaviors that seek out and maintain stable conditions, such as balancing and damage avoidance, that are closely tied to an environment's prevailing sources of entropy, such as wind, earthquakes, and other agents. We demonstrate that our surprise minimizing agents can successfully play Tetris, Doom, control a humanoid to avoid falls and navigate to escape enemy agents, without any task-specific reward supervision. We further show that SMiRL can be used together with a standard task reward to accelerate reward-driven learning.
What Can Learned Intrinsic Rewards Capture?
Zheng, Zeyu, Oh, Junhyuk, Hessel, Matteo, Xu, Zhongwen, Kroiss, Manuel, van Hasselt, Hado, Silver, David, Singh, Satinder
Reinforcement learning agents can include different components, such as policies, value functions, state representations, and environment models. Any or all of these can be the loci of knowledge, i.e., structures where knowledge, whether given or learned, can be deposited and reused. The objective of an agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. As far as the learning algorithm is concerned, these rewards are typically given and immutable. In this paper we instead consider the proposition that the reward function itself may be a good locus of knowledge. This is consistent with a common use, in the literature, of hand-designed intrinsic rewards to improve the learning dynamics of an agent. We adopt the multi-lifetime setting of the Optimal Rewards Framework, and propose to meta-learn an intrinsic reward function from experience that allows agents to maximise their extrinsic rewards accumulated until the end of their lifetimes. Rewards as a locus of knowledge provide guidance on "what" the agent should strive to do rather than "how" the agent should behave; the latter is more directly captured in policies or value functions for example. Thus, our focus here is on demonstrating the following: (1) that it is feasible to meta-learn good reward functions, (2) that the learned reward functions can capture interesting kinds of "what" knowledge, and (3) that because of the indirectness of this form of knowledge the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment. Reinforcement learning agents can store knowledge in their policies, value functions, state representations, and models of the environment dynamics. These components can be the loci of knowledge in the sense that they are structures in which knowledge, either learned from experience by the agent's algorithm or given by the agent-designer, can be deposited and reused.
Neural-Symbolic Descriptive Action Model from Images: The Search for STRIPS
Not submitted to the 30th International Conference on Automated Planning and SchedulingNeural-Symbolic Descriptive Action Model from Images: The Search for STRIPS Masataro Asai MIT -IBM Watson AI Lab, Cambridge USA IBM Research Abstract Recent work on Neural-Symbolic systems that learn the discrete planning model from images has opened a promising direction for expanding the scope of Automated Planning and Scheduling to the raw, noisy data. However, previous work only partially addressed this problem, utilizing the black-box neural model as the successor generator. In this work, we propose Double-Stage Action Model Acquisition (DSAMA), a system that obtains a descriptive PDDL action model with explicit preconditions and effects over the propositional variables unsupervised-learned from images. DSAMA trains a set of Random Forest rule-based classifiers and compiles them into logical formulae in PDDL. While we obtained a competitively accurate PDDL model compared to a black-box model, we observed that the resulting PDDL is too large and complex for the state-of-the-art standard planners such as Fast Downward primarily due to the PDDL-SAS translator bottleneck. From this negative result, we show that this translator bottleneck cannot be addressed just by using a different, existing rule-based learning method, and we point to the potential future directions. 1 Introduction Recently, Latplan system (Asai and Fukunaga 2018) successfully connected a subsymbolic neural network (NN) system and a symbolic Classical Planning system to solve various visually presented puzzle domains. The system consists of four parts: 1) The State AutoEncoder (SAE) neural network learns a bidirectional mapping between images and propositional states with unsupervised training. The proposed framework opened a promising direction for applying a variety of symbolic methods to the real world -- For example, the search space generated by Latplan was shown to be compatible with a symbolic Goal Recognition system (Amado et al. 2018a; 2018b). Several variations replacing the state encoding modules have also been proposed: Causal InfoGAN (Kurutach et al. 2018) uses a GAN-based framework, First-Order SAE (Asai 2019) learns the First Order Logic symbols (instead of the propositional ones), and Zero-Suppressed SAE (Asai (:action a0:parameters ():precondition [D0]:effect (and (when [E00] (z0)) (when (not [E00]) (not (z0))) (when [E01] (z1)) (when (not [E01]) (not (z1))) ...)) Figure 1: An example DSAMA compilation result for the first action (i.e. Despite these efforts, Latplan is missing a critical feature of the traditional Classical Planning systems: The use of State-of-the-Art heuristic functions.
BERT has a Moral Compass: Improvements of ethical and moral values of machines
Schramowski, Patrick, Turan, Cigdem, Jentzsch, Sophie, Rothkopf, Constantin, Kersting, Kristian
Allowing machines to choose whether to kill humans would be devastating for world peace and security. But how do we equip machines with the ability to learn ethical or even moral choices? Jentzsch et al.(2019) showed that applying machine learning to human texts can extract deontological ethical reasoning about "right" and "wrong" conduct by calculating a moral bias score on a sentence level using sentence embeddings. The machine learned that it is objectionable to kill living beings, but it is fine to kill time; It is essential to eat, yet one might not eat dirt; it is important to spread information, yet one should not spread misinformation. However, the evaluated moral bias was restricted to simple actions -- one verb -- and a ranking of actions with surrounding context. Recently BERT ---and variants such as RoBERTa and SBERT--- has set a new state-of-the-art performance for a wide range of NLP tasks. But has BERT also a better moral compass? In this paper, we discuss and show that this is indeed the case. Thus, recent improvements of language representations also improve the representation of the underlying ethical and moral values of the machine. We argue that through an advanced semantic representation of text, BERT allows one to get better insights of moral and ethical values implicitly represented in text. This enables the Moral Choice Machine (MCM) to extract more accurate imprints of moral choices and ethical values.
Efficient Robotic Task Generalization Using Deep Model Fusion Reinforcement Learning
Wang, Tianying, Zhang, Hao, Toh, Wei Qi, Zhu, Hongyuan, Tan, Cheston, Wu, Yan, Liu, Yong, Jing, Wei
Learning-based methods have been used to pro-gram robotic tasks in recent years. However, extensive training is usually required not only for the initial task learning but also for generalizing the learned model to the same task but in different environments. In this paper, we propose a novel Deep Reinforcement Learning algorithm for efficient task generalization and environment adaptation in the robotic task learning problem. The proposed method is able to efficiently generalize the previously learned task by model fusion to solve the environment adaptation problem. The proposed Deep Model Fusion (DMF) method reuses and combines the previously trained model to improve the learning efficiency and results.Besides, we also introduce a Multi-objective Guided Reward(MGR) shaping technique to further improve training efficiency.The proposed method was benchmarked with previous methods in various environments to validate its effectiveness.
Hacked flight records show how police using drones to conduct residential surveillance
Flight records and related materials from police drone programs have been uncovered following a security breach at DroneSense, which provides services to a number of private corporations and government agencies. The records included flight paths, pilot names and email addresses, and operation names from more than 200 different drone flights, offering insight into how police use drones in day to day law enforcement. The records come from drone operations at the Atlanta Police Department, Nassau County Police Department, and others. The files also included information from other DroneSense clients, including Boise Fire Department, City of Coral Springs, and the US Army Corps of Engineers. According to a report in Vice, the records show a number of different police drone operations, including the Atlanta police using a drone to surveil an apartment complex and nearby parking lot.