Goto

Collaborating Authors

 Reinforcement Learning


Best Deep Reinforcement Learning Research of 2019

#artificialintelligence

Reinforcement learning has seen great advancements in the past five years. The successful introduction of deep learning in place of more traditional methods allowed reinforcement learning to scale to very complex domains achieving super-human performance in environments like the game of Go or numerous video games. Despite great successes in multiple domains, these new methods suffer from their own issues that make them often inapplicable to the real world problems. Extreme lack of data efficiency, together with huge variance and difficulty in enforcing safety constraints, is one of the three most prominent issues in the field. Usually, millions of data points sampled from the environment are necessary for these algorithms to converge to acceptable policies.


GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

arXiv.org Machine Learning

We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. GradientDICE fixes several problems with GenDICE (Zhang et al., 2020), the current state-of-the-art for estimating such density ratios. Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced, so primal-dual algorithms are not guaranteed to find the desired solution. However, such nonlinearity is essential to ensure the consistency of GenDICE even with a tabular representation. This is a fundamental contradiction, resulting from GenDICE's original formulation of the optimization problem. In GradientDICE, we optimize a different objective from GenDICE by using the Perron-Frobenius theorem and eliminating GenDICE's use of divergence. Consequently, nonlinearity in parameterization is not necessary for GradientDICE, which is provably convergent under linear function approximation.


On the Convergence of Artificial Intelligence and Distributed Ledger Technology: A Scoping Review and Future Research Agenda

arXiv.org Artificial Intelligence

Developments in Artificial Intelligence (AI) and Distributed Ledger Technology (DLT) currently lead lively debates in academia and practice. AI processes data to perform tasks that were previously thought possible only for humans to perform. DLT acts in uncertain environments to create consensus over data among a group of participants. In recent articles, both technologies complement each other. Examples include the design of secure distributed ledgers or the creation of allied learning systems distributed across multiple nodes. This can lead to technological convergence, which in the past, has paved the way for major IT product innovations. Previous work highlights several potential benefits of the convergence of AI and DLT but only provides a limited theoretical framework to describe upcoming real-world integration cases of both technologies. We aim to contribute by conducting a systematic literature review on the previous work and by providing rigorously derived future research opportunities. Our analysis identifies how AI and DLT exchange data, and how to use these integration principles to build new systems. Based on that, we present open questions for future research. This work helps researchers active in AI or DLT to overcome current limitations in their field, and engineers to develop systems along with the convergence of these technologies.


Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

arXiv.org Artificial Intelligence

We consider the problem of off-policy evaluation for reinforcement learning, where the goal is to estimate the expected reward of a target policy $\pi$ using offline data collected by running a logging policy $\mu$. Standard importance-sampling based approaches for this problem suffer from a variance that scales exponentially with time horizon $H$, which motivates a splurge of recent interest in alternatives that break the "Curse of Horizon" (Liu et al. 2018, Xie et al. 2019). In particular, it was shown that a marginalized importance sampling (MIS) approach can be used to achieve an estimation error of order $O(H^3/ n)$ in mean square error (MSE) under an episodic Markov Decision Process model with finite states and potentially infinite actions. The MSE bound however is still a factor of $H$ away from a Cramer-Rao lower bound of order $\Omega(H^2/n)$. In this paper, we prove that with a simple modification to the MIS estimator, we can asymptotically attain the Cramer-Rao lower bound, provided that the action space is finite. We also provide a general method for constructing MIS estimators with high-probability error bounds.


Facebook AI Researchers Achieve a 107x Speedup for Training Virtual Agents – NVIDIA Developer News Center

#artificialintelligence

Navigating a new indoor space without any prior knowledge or even a map is a challenging task for a human, let alone a robot. To help develop intelligent machines that interact more effectively with complex 3D environments, Facebook researchers developed a GPU-accelerated deep reinforcement learning model that achieves near 100 percent success in navigating a variety of virtual environments without a pre-provided map. To achieve this breakthrough, the team focused their work on developing an efficient approach to scaling RL models, which require a significant number of training samples, using multi-node distribution. "A single parameter server and thousands of (typically CPU) workers may be fundamentally incompatible with the needs of modern computer vision and robotics communities," the researchers explained in their post, Near-perfect point-goal navigation from 2.5 billion frames of experience. "Unlike Gym or Atari, 3D simulators require GPU acceleration…. The desired agents operate from high-dimensional inputs (pixels) and use deep networks, such as ResNet50, which strain the parameter server. Thus, existing distributed RL architectures do not scale and there is a need to develop a new distributed architecture."


Python For Network Engineers Bootcamp

#artificialintelligence

Link: Python For Network Engineers Bootcamp Get udemy course code Real-Life Hands-On Python Automation: Netmiko, Paramiko, Napalm, Nornir, GNS3,Telnet, SSH, Cisco, Arista, Linux etc Network Automation or Network Programming using Python and have the desire New What you'll learn You will MASTER all the Python 3 key concepts starting from Scratch. No prior Python or programming knowledge is required Learn network programmability with Python See real-world examples of automation scripts with Python for Cisco IOS, Arista EOS or Linux Learn how to use and improve Paramiko and Netmiko for automation of common administration tasks with Python Learn how to configure networking devices with Python You will learn in-depth general Python Programming Use NAPALM Python library in a Multivendor Environment Understand how to use Telnet and SSH with Python for network automation Learn how to automate the configuration of networking devices with Python 3 in a Multivendor Environment Description ***Fully updated for 2020*** This Network Automation with Python course also covers every major General Python Programming topic and is a perfect match for both beginners and experienced developers! Welcome to this Python hands-on course for learning Network Automation and Programmability with Python in a Cisco or Multivendor Environment. Boost your Python Network Programming Skills by learning one of the hottest topic in the Networking Industry in 2019 and become one of the best Network Engineer! This course is based on Python 3 and doesn't require prior Python Programming knowledge.


Python For Network Engineers Bootcamp

#artificialintelligence

Link: Python For Network Engineers Bootcamp Get udemy course code Real-Life Hands-On Python Automation: Netmiko, Paramiko, Napalm, Nornir, GNS3,Telnet, SSH, Cisco, Arista, Linux etc Network Automation or Network Programming using Python and have the desire New What you'll learn You will MASTER all the Python 3 key concepts starting from Scratch. No prior Python or programming knowledge is required Learn network programmability with Python See real-world examples of automation scripts with Python for Cisco IOS, Arista EOS or Linux Learn how to use and improve Paramiko and Netmiko for automation of common administration tasks with Python Learn how to configure networking devices with Python You will learn in-depth general Python Programming Use NAPALM Python library in a Multivendor Environment Understand how to use Telnet and SSH with Python for network automation Learn how to automate the configuration of networking devices with Python 3 in a Multivendor Environment Description ***Fully updated for 2020*** This Network Automation with Python course also covers every major General Python Programming topic and is a perfect match for both beginners and experienced developers! Welcome to this Python hands-on course for learning Network Automation and Programmability with Python in a Cisco or Multivendor Environment. Boost your Python Network Programming Skills by learning one of the hottest topic in the Networking Industry in 2019 and become one of the best Network Engineer! This course is based on Python 3 and doesn't require prior Python Programming knowledge.


Artificial Intelligence Aided Next-Generation Networks Relying on UAVs

arXiv.org Artificial Intelligence

Artificial intelligence (AI) assisted unmanned aerial vehicle (UAV) aided next-generation networking is proposed for dynamic environments. In the AI-enabled UAV-aided wireless networks (UAWN), multiple UAVs are employed as aerial base stations, which are capable of rapidly adapting to the dynamic environment by collecting information about the users' position and tele-traffic demands, learning from the environment and acting upon the feedback received from the users. Moreover, AI enables the interaction amongst a swarm of UAVs for cooperative optimization of the system. As a benefit of the AI framework, several challenges of conventional UAWN may be circumvented, leading to enhanced network performance, improved reliability and agile adaptivity. As a further benefit, dynamic trajectory design and resource allocation are demonstrated. Finally, potential research challenges and opportunities are discussed.


Distal Explanations for Explainable Reinforcement Learning Agents

arXiv.org Artificial Intelligence

Causal explanations present an intuitive way to understand the course of events through causal chains, and are widely accepted in cognitive science as the prominent model humans use for explanation. Importantly, causal models can generate opportunity chains, which take the form of `A enables B and B causes C'. We ground the notion of opportunity chains in human-agent experimental data, where we present participants with explanations from different models and ask them to provide their own explanations for agent behaviour. Results indicate that humans do in-fact use the concept of opportunity chains frequently for describing artificial agent behaviour. Recently, action influence models have been proposed to provide causal explanations for model-free reinforcement learning (RL). While these models can generate counterfactuals---things that did not happen but could have under different conditions---they lack the ability to generate explanations of opportunity chains. We introduce a distal explanation model that can analyse counterfactuals and opportunity chains using decision trees and causal models. We employ a recurrent neural network to learn opportunity chains and make use of decision trees to improve the accuracy of task prediction and the generated counterfactuals. We computationally evaluate the model in 6 RL benchmarks using different RL algorithms, and show that our model performs better in task prediction. We report on a study with 90 participants who receive explanations of RL agents behaviour in solving three scenarios: 1) Adversarial; 2) Search and rescue; and 3) Human-Agent collaborative scenarios. We investigate the participants' understanding of the agent through task prediction and their subjective satisfaction of the explanations and show that our distal explanation model results in improved outcomes over the three scenarios compared with two baseline explanation models.


Towards Learning Multi-agent Negotiations via Self-Play

arXiv.org Artificial Intelligence

Making sophisticated, robust, and safe sequential decisions is at the heart of intelligent systems. This is especially critical for planning in complex multi-agent environments, where agents need to anticipate other agents' intentions and possible future actions. Traditional methods formulate the problem as a Markov Decision Process, but the solutions often rely on various assumptions and become brittle when presented with corner cases. In contrast, deep reinforcement learning (Deep RL) has been very effective at finding policies by simultaneously exploring, interacting, and learning from environments. Leveraging the powerful Deep RL paradigm, we demonstrate that an iterative procedure of self-play can create progressively more diverse environments, leading to the learning of sophisticated and robust multi-agent policies. W e demonstrate this in a challenging multi-agent simulation of merging traffic, where agents must interact and negotiate with others in order to successfully merge on or off the road. While the environment starts off simple, we increase its complexity by iteratively adding an increasingly diverse set of agents to the agent "zoo" as training progresses. Qualitatively, we find that through self-play, our policies automatically learn interesting behaviors such as defensive driving, overtaking, yielding, and the use of signal lights to communicate intentions to other agents. In addition, quantitatively, we show a dramatic improvement of the success rate of merging maneuvers from 63% to over 98%.