Country
Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward
Sheikh, Hassam Ullah, Bölöni, Ladislau
Many cooperative multi-agent problems require agents to learn individual tasks while contributing to the collective success of the group. This is a challenging task for current state-of-the-art multi-agent reinforcement algorithms that are designed to either maximize the global reward of the team or the individual local rewards. The problem is exacerbated when either of the rewards is sparse leading to unstable learning. To address this problem, we present Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG): a novel cooperative multi-agent reinforcement learning framework that simultaneously learns to maximize the global and local rewards. We evaluate our solution on the challenging defensive escort team problem and show that our solution achieves a significantly better and more stable performance than the direct adaptation of the MADDPG algorithm.
Modeling Contrary-to-Duty with CP-nets
Calegari, Roberta, Loreggia, Andrea, Lorini, Emiliano, Rossi, Francesca, Sartor, Giovanni
Modelling deontic notions through preferences [12] has the advantage of linking deontic notions to the manifold research on preferences, in multiple disciplines, such as philosophy, mathematics, economics and politics. In recent years, preferences have also been addressed within AI [15,8,18] and applications can be found in multi-agent systems [19] and recommender systems [17]. We shall model deontic notions through ceteris-paribus preferences, namely, conditional preferences for a state of affairs over another state of affairs, all the rest being equal. In particular, we shall focus on the ceteris-paribus preference for a proposition over its complement. The idea of ceteris-paribus preferences was originally introduced by the philosopher and logician Georg von Wright [22].
Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning
Long, Qian, Zhou, Zihan, Gupta, Abhibav, Fang, Fei, Wu, Yi, Wang, Xiaolong
In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large. In this paper, we introduce Evolutionary Population Curriculum (EPC), a curriculum learning paradigm that scales up Multi-Agent Reinforcement Learning (MARL) by progressively increasing the population of training agents in a stage-wise manner. Furthermore, EPC uses an evolutionary approach to fix an objective misalignment issue throughout the curriculum: agents successfully trained in an early stage with a small population are not necessarily the best candidates for adapting to later stages with scaled populations. Concretely, EPC maintains multiple sets of agents in each stage, performs mix-and-match and fine-tuning over these sets and promotes the sets of agents with the best adaptability to the next stage. We implement EPC on a popular MARL algorithm, MADDPG, and empirically show that our approach consistently outperforms baselines by a large margin as the number of agents grows exponentially. The project page is https://sites.google.com/view/epciclr2020.
What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization
Belth, Caleb, Zheng, Xinyi, Vreeken, Jilles, Koutra, Danai
Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG. In this work, we introduce a unified solution to KG characterization by formulating the problem as unsupervised KG summarization with a set of inductive, soft rules, which describe what is normal in a KG, and thus can be used to identify what is abnormal, whether it be strange or missing. Unlike first-order logic rules, our rules are labeled, rooted graphs, i.e., patterns that describe the expected neighborhood around a (seen or unseen) node, based on its type, and information in the KG. Stepping away from the traditional support/confidence-based rule mining techniques, we propose KGist, Knowledge Graph Inductive SummarizaTion, which learns a summary of inductive rules that best compress the KG according to the Minimum Description Length principle---a formulation that we are the first to use in the context of KG rule mining. We apply our rules to three large KGs (NELL, DBpedia, and Yago), and tasks such as compression, various types of error detection, and identification of incomplete information. We show that KGist outperforms task-specific, supervised and unsupervised baselines in error detection and incompleteness identification, (identifying the location of up to 93% of missing entities---over 10% more than baselines), while also being efficient for large knowledge graphs.
Using Deep Reinforcement Learning Methods for Autonomous Vessels in 2D Environments
Etemad, Mohammad, Zare, Nader, Sarvmaili, Mahtab, Soares, Amilcar, Machado, Bruno Brandoli, Matwin, Stan
Unmanned Surface Vehicles technology (USVs) is an exciting topic that essentially deploys an algorithm to safely and efficiently performs a mission. Although reinforcement learning is a well-known approach to modeling such a task, instability and divergence may occur when combining off-policy and function approximation. In this work, we used deep reinforcement learning combining Q-learning with a neural representation to avoid instability. Our methodology uses deep q-learning and combines it with a rolling wave planning approach on agile methodology. Our method contains two critical parts in order to perform missions in an unknown environment. The first is a path planner that is responsible for generating a potential effective path to a destination without considering the details of the root. The latter is a decision-making module that is responsible for short-term decisions on avoiding obstacles during the near future steps of USV exploitation within the context of the value function. Simulations were performed using two algorithms: a basic vanilla vessel navigator (VVN) as a baseline and an improved one for the vessel navigator with a planner and local view (VNPLV). Experimental results show that the proposed method enhanced the performance of VVN by 55.31 on average for long-distance missions. Our model successfully demonstrated obstacle avoidance by means of deep reinforcement learning using planning adaptive paths in unknown environments.
Information-Theoretic Free Energy as Emotion Potential: Emotional Valence as a Function of Complexity and Novelty
This study extends the mathematical model of emotion dimensions that we previously proposed (Yanagisawa, et al. 2019, Front Comput Neurosci) to consider perceived complexity as well as novelty, as a source of arousal potential. Berlyne's hedonic function of arousal potential (or the inverse U-shaped curve, the so-called Wundt curve) is assumed. We modeled the arousal potential as information contents to be processed in the brain after sensory stimuli are perceived (or recognized), which we termed sensory surprisal. We mathematically demonstrated that sensory surprisal represents free energy, and it is equivalent to a summation of information gain (or information from novelty) and perceived complexity (or information from complexity), which are the collative variables forming the arousal potential. We demonstrated empirical evidence with visual stimuli (profile shapes of butterfly) supporting the hypothesis that the summation of perceived novelty and complexity shapes the inverse U-shaped beauty function. We discussed the potential of free energy as a mathematical principle explaining emotion initiators.
Generating new concepts with hybrid neuro-symbolic models
Feinman, Reuben, Lake, Brenden M.
Human conceptual knowledge supports the ability to generate novel yet highly structured concepts, and the form of this conceptual knowledge is of great interest to cognitive scientists. One tradition has emphasized structured knowledge, viewing concepts as embedded in intuitive theories or organized in complex symbolic knowledge structures. A second tradition has emphasized statistical knowledge, viewing conceptual knowledge as an emerging from the rich correlational structure captured by training neural networks and other statistical models. In this paper, we explore a synthesis of these two traditions through a novel neuro-symbolic model for generating new concepts. Using simple visual concepts as a testbed, we bring together neural networks and symbolic probabilistic programs to learn a generative model of novel handwritten characters. Two alternative models are explored with more generic neural network architectures. We compare each of these three models for their likelihoods on held-out character classes and for the quality of their productions, finding that our hybrid model learns the most convincing representation and generalizes further from the training observations.
Building a COVID-19 Vulnerability Index
DeCaprio, Dave, Gartner, Joseph, Burgess, Thadeus, Kothari, Sarthak, Sayed, Shaayan, McCall, Carol J.
COVID-19 is an acute respiratory disease that has been classified as a pandemic by the World Health Organization. Information regarding this particular disease is limited, however, it is known to have high mortality rates, particularly among individuals with preexisting medical conditions. Creating models to identify individuals who are at the greatest risk for severe complications due to COVID-19 will be useful to help for outreach campaigns in mitigating the diseases worst effects. While information specific to COVID-19 is limited, a model using complications due to other upper respiratory infections can be used as a proxy to help identify those individuals who are at the greatest risk. We present the results for three models predicting such complications, with each model having varying levels of predictive effectiveness at the expense of ease of implementation.
Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling
Che, Tong, Zhang, Ruixiang, Sohl-Dickstein, Jascha, Larochelle, Hugo, Paull, Liam, Cao, Yuan, Bengio, Yoshua
We show that the sum of the implicit generator log-density $\log p_g$ of a GAN with the logit score of the discriminator defines an energy function which yields the true data density when the generator is imperfect but the discriminator is optimal, thus making it possible to improve on the typical generator (with implicit density $p_g$). To make that practical, we show that sampling from this modified density can be achieved by sampling in latent space according to an energy-based model induced by the sum of the latent prior log-density and the discriminator output score. This can be achieved by running a Langevin MCMC in latent space and then applying the generator function, which we call Discriminator Driven Latent Sampling~(DDLS). We show that DDLS is highly efficient compared to previous methods which work in the high-dimensional pixel space and can be applied to improve on previously trained GANs of many types. We evaluate DDLS on both synthetic and real-world datasets qualitatively and quantitatively. On CIFAR-10, DDLS substantially improves the Inception Score of an off-the-shelf pre-trained SN-GAN~\citep{sngan} from $8.22$ to $9.09$ which is even comparable to the class-conditional BigGAN~\citep{biggan} model. This achieves a new state-of-the-art in unconditional image synthesis setting without introducing extra parameters or additional training.
Convex Hull Monte-Carlo Tree Search
Painter, Michael, Lacerda, Bruno, Hawes, Nick
This work investigates Monte-Carlo planning for agents in stochastic environments, with multiple objectives. We propose the Convex Hull Monte-Carlo Tree-Search (CHMCTS) framework, which builds upon Trial Based Heuristic Tree Search and Convex Hull Value Iteration (CHVI), as a solution to multi-objective planning in large environments. Moreover, we consider how to pose the problem of approximating multiobjective planning solutions as a contextual multi-armed bandits problem, giving a principled motivation for how to select actions from the view of contextual regret. This leads us to the use of Contextual Zooming for action selection, yielding Zooming CHMCTS. We evaluate our algorithm using the Generalised Deep Sea Treasure environment, demonstrating that Zooming CHMCTS can achieve a sublinear contextual regret and scales better than CHVI on a given computational budget.