Nardelli, Nantas
Can Reinforcement Learning support policy makers? A preliminary study with Integrated Assessment Models
Wolf, Theodore, Nardelli, Nantas, Shawe-Taylor, John, Perez-Ortiz, Maria
Governments around the world aspire to ground decision-making on evidence. Many of the foundations of policy making - e.g. sensing patterns that relate to societal needs, developing evidence-based programs, forecasting potential outcomes of policy changes, and monitoring effectiveness of policy programs - have the potential to benefit from the use of large-scale datasets or simulations together with intelligent algorithms. These could, if designed and deployed in a way that is well grounded on scientific evidence, enable a more comprehensive, faster, and rigorous approach to policy making. Integrated Assessment Models (IAM) is a broad umbrella covering scientific models that attempt to link main features of society and economy with the biosphere into one modelling framework. At present, these systems are probed by policy makers and advisory groups in a hypothesis-driven manner. In this paper, we empirically demonstrate that modern Reinforcement Learning can be used to probe IAMs and explore the space of solutions in a more principled manner. While the implication of our results are modest since the environment is simplistic, we believe that this is a stepping stone towards more ambitious use cases, which could allow for effective exploration of policies and understanding of their consequences and limitations.
WordCraft: An Environment for Benchmarking Commonsense Agents
Jiang, Minqi, Luketina, Jelena, Nardelli, Nantas, Minervini, Pasquale, Torr, Philip H. S., Whiteson, Shimon, Rocktรคschel, Tim
The ability to quickly solve a wide range of real-world tasks requires a commonsense understanding of the world. Yet, how to best extract such knowledge from natural language corpora and integrate it with reinforcement learning (RL) agents remains an open challenge. This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and provide knowledge sources grounded with respect to observations in an RL environment. To better enable research on agents making use of commonsense knowledge, we propose WordCraft, an RL environment based on Little Alchemy 2. This lightweight environment is fast to run and built upon entities and relations inspired by real-world semantics. We evaluate several representation learning methods on this new benchmark and propose a new method for integrating knowledge graphs with an RL agent.
The NetHack Learning Environment
Kรผttler, Heinrich, Nardelli, Nantas, Miller, Alexander H., Raileanu, Roberta, Selvatici, Marco, Grefenstette, Edward, Rocktรคschel, Tim
Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack. We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience. We compare NLE and its task suite to existing alternatives, and discuss why it is an ideal medium for testing the robustness and systematic generalization of RL agents. We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration, alongside qualitative analysis of various agents trained in the environment. NLE is open source at https://github.com/facebookresearch/nle.
Simulation-Based Inference for Global Health Decisions
de Witt, Christian Schroeder, Gram-Hansen, Bradley, Nardelli, Nantas, Gambardella, Andrew, Zinkov, Rob, Dokania, Puneet, Siddharth, N., Espinosa-Gonzalez, Ana Belen, Darzi, Ara, Torr, Philip, Baydin, Atฤฑlฤฑm Gรผneล
This is fomenting the development of comprehensive modelling The COVID-19 pandemic has highlighted the importance and simulation to support the design of health interventions of in-silico epidemiological modelling in predicting and policies, and to guide decision-making in a variety of the dynamics of infectious diseases to inform health system domains [22, 49]. For example, simulations health policy and decision makers about suitable prevention have provided valuable insight to deal with public health and containment strategies. Work in this setting problems such as tobacco consumption in New Zealand [50], involves solving challenging inference and control and diabetes and obesity in the US [58]. They have been problems in individual-based models of ever increasing used to explore policy options such as those in maternal and complexity. Here we discuss recent breakthroughs antenatal care in Uganda [44], and applied to evaluate health in machine learning, specifically in simulation-based reform scenarios such as predicting changes in access to inference, and explore its potential as a novel venue primary care services in Portugal [21]. Their applicability for model calibration to support the design and evaluation in informing the design of cancer screening programmes of public health interventions. To further stimulate has been also discussed [42, 23]. Recently, simulations have research, we are developing software interfaces that informed the response to the COVID-19 outbreak [19].
MVFST-RL: An Asynchronous RL Framework for Congestion Control with Delayed Actions
Sivakumar, Viswanath, Rocktรคschel, Tim, Miller, Alexander H., Kรผttler, Heinrich, Nardelli, Nantas, Rabbat, Mike, Pineau, Joelle, Riedel, Sebastian
Effective network congestion control strategies are key to keeping the Internet (or any large computer network) operational. Network congestion control has been dominated by hand-crafted heuristics for decades. Recently, ReinforcementLearning (RL) has emerged as an alternative to automatically optimize such control strategies. Research so far has primarily considered RL interfaces which block the sender while an agent considers its next action. This is largely an artifact of building on top of frameworks designed for RL in games (e.g. OpenAI Gym). However, this does not translate to real-world networking environments, where a network sender waiting on a policy without sending data is costly for throughput. We instead propose to formulate congestion control with an asynchronous RL agent that handles delayed actions. We present MVFST-RL, a scalable framework for congestion control in the QUIC transport protocol that leverages state-of-the-art in asynchronous RL training with off-policy correction. We analyze modeling improvements to mitigate the deviation from Markovian dynamics, and evaluate our method on emulated networks from the Pantheon benchmark platform. The source code is publicly available at https://github.com/facebookresearch/mvfst-rl.
TorchBeast: A PyTorch Platform for Distributed RL
Kรผttler, Heinrich, Nardelli, Nantas, Lavril, Thibaut, Selvatici, Marco, Sivakumar, Viswanath, Rocktรคschel, Tim, Grefenstette, Edward
TorchBeast is a platform for reinforcement learning (RL) research in PyTorch. It implements a version of the popular IMPALA algorithm for fast, asynchronous, parallel training of RL agents. Additionally, TorchBeast has simplicity as an explicit design goal: We provide both a pure-Python implementation ("MonoBeast") as well as a multi-machine high-performance version ("PolyBeast"). In the latter, parts of the implementation are written in C++, but all parts pertaining to machine learning are kept in simple Python using PyTorch, with the environments provided using the OpenAI Gym interface. This enables researchers to conduct scalable RL research using TorchBeast without any programming knowledge beyond Python and PyTorch. In this paper, we describe the TorchBeast design principles and implementation and demonstrate that it performs on-par with IMPALA on Atari. TorchBeast is released as an open-source package under the Apache 2.0 license and is available at \url{https://github.com/facebookresearch/torchbeast}.
A Survey of Reinforcement Learning Informed by Natural Language
Luketina, Jelena, Nardelli, Nantas, Farquhar, Gregory, Foerster, Jakob, Andreas, Jacob, Grefenstette, Edward, Whiteson, Shimon, Rocktรคschel, Tim
To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand. Recent advances in representation learning for language make it possible to build models that acquire world knowledge from text corpora and integrate this knowledge into downstream decision making problems. We thus argue that the time is right to investigate a tight integration of natural language understanding into RL in particular. We survey the state of the field, including work on instruction following, text games, and learning from textual domain knowledge. Finally, we call for the development of new environments as well as further investigation into the potential uses of recent Natural Language Processing (NLP) techniques for such tasks.
Multitask Soft Option Learning
Igl, Maximilian, Gambardella, Andrew, Nardelli, Nantas, Siddharth, N., Bรถhmer, Wendelin, Whiteson, Shimon
We present Multitask Soft Option Learning (MSOL), a hierarchical multitask framework based on Planning as Inference. MSOL extends the concept of options, using separate variational posteriors for each task, regularized by a shared prior. This allows fine-tuning of options for new tasks without forgetting their learned policies, leading to faster training without reducing the expressiveness of the hierarchical policy. Additionally, MSOL avoids several instabilities during training in a multitask setting and provides a natural way to not only learn intra-option policies, but also their terminations. We demonstrate empirically that MSOL significantly outperforms both hierarchical and flat transfer-learning baselines in challenging multi-task environments.
The StarCraft Multi-Agent Challenge
Samvelyan, Mikayel, Rashid, Tabish, de Witt, Christian Schroeder, Farquhar, Gregory, Nardelli, Nantas, Rudner, Tim G. J., Hung, Chia-Man, Torr, Philip H. S., Foerster, Jakob, Whiteson, Shimon
In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge maps and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms. We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0.
Value Propagation Networks
Nardelli, Nantas, Synnaeve, Gabriel, Lin, Zeming, Kohli, Pushmeet, Torr, Philip H. S., Usunier, Nicolas
We present Value Propagation (VProp), a parameter-efficient differentiable planning module built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. Furthermore, we show that the module enables learning to plan when the environment also includes stochastic elements, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive navigation problems. We evaluate on static and dynamic configurations of MazeBase grid-worlds, with randomly generated environments of several different sizes, and on a StarCraft navigation scenario, with more complex dynamics, and pixels as input.