AITopics

Industry: Leisure & Entertainment > Games (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zhang, Shangtong, Liu, Bo, Whiteson, Shimon

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

arXiv.org Machine LearningJan-29-2020

We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. GradientDICE fixes several problems with GenDICE (Zhang et al., 2020), the current state-of-the-art for estimating such density ratios. Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced, so primal-dual algorithms are not guaranteed to find the desired solution. However, such nonlinearity is essential to ensure the consistency of GenDICE even with a tabular representation. This is a fundamental contradiction, resulting from GenDICE's original formulation of the optimization problem. In GradientDICE, we optimize a different objective from GenDICE by using the Perron-Frobenius theorem and eliminating GenDICE's use of divergence. Consequently, nonlinearity in parameterization is not necessary for GradientDICE, which is provably convergent under linear function approximation.

gendice, gradientdice, nonlinearity, (13 more...)

arXiv.org Machine Learning

2001.11113

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Pandl, Konstantin D., Thiebes, Scott, Schmidt-Kraepelin, Manuel, Sunyaev, Ali

On the Convergence of Artificial Intelligence and Distributed Ledger Technology: A Scoping Review and Future Research Agenda

arXiv.org Artificial IntelligenceJan-29-2020

Developments in Artificial Intelligence (AI) and Distributed Ledger Technology (DLT) currently lead lively debates in academia and practice. AI processes data to perform tasks that were previously thought possible only for humans to perform. DLT acts in uncertain environments to create consensus over data among a group of participants. In recent articles, both technologies complement each other. Examples include the design of secure distributed ledgers or the creation of allied learning systems distributed across multiple nodes. This can lead to technological convergence, which in the past, has paved the way for major IT product innovations. Previous work highlights several potential benefits of the convergence of AI and DLT but only provides a limited theoretical framework to describe upcoming real-world integration cases of both technologies. We aim to contribute by conducting a systematic literature review on the previous work and by providing rigorously derived future research opportunities. Our analysis identifies how AI and DLT exchange data, and how to use these integration principles to build new systems. Based on that, we present open questions for future research. This work helps researchers active in AI or DLT to overcome current limitations in their field, and engineers to develop systems along with the convergence of these technologies.

blockchain, dlt, smart contract, (15 more...)

2001.11017

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (1.00)
Information Technology > Services > e-Commerce Services (0.84)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Yin, Ming, Wang, Yu-Xiang

Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

arXiv.org Artificial IntelligenceJan-29-2020

We consider the problem of off-policy evaluation for reinforcement learning, where the goal is to estimate the expected reward of a target policy $\pi$ using offline data collected by running a logging policy $\mu$. Standard importance-sampling based approaches for this problem suffer from a variance that scales exponentially with time horizon $H$, which motivates a splurge of recent interest in alternatives that break the "Curse of Horizon" (Liu et al. 2018, Xie et al. 2019). In particular, it was shown that a marginalized importance sampling (MIS) approach can be used to achieve an estimation error of order $O(H^3/ n)$ in mean square error (MSE) under an episodic Markov Decision Process model with finite states and potentially infinite actions. The MSE bound however is still a factor of $H$ away from a Cramer-Rao lower bound of order $\Omega(H^2/n)$. In this paper, we prove that with a simple modification to the MIS estimator, we can asymptotically attain the Cramer-Rao lower bound, provided that the action space is finite. We also provide a general method for constructing MIS estimators with high-probability error bounds.

estimator, tabular-mis estimator, variance, (12 more...)

2001.10742

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

#artificialintelligenceJan-28-2020, 21:03:00 GMT

Facebook AI Researchers Achieve a 107x Speedup for Training Virtual Agents – NVIDIA Developer News Center

Navigating a new indoor space without any prior knowledge or even a map is a challenging task for a human, let alone a robot. To help develop intelligent machines that interact more effectively with complex 3D environments, Facebook researchers developed a GPU-accelerated deep reinforcement learning model that achieves near 100 percent success in navigating a variety of virtual environments without a pre-provided map. To achieve this breakthrough, the team focused their work on developing an efficient approach to scaling RL models, which require a significant number of training samples, using multi-node distribution. "A single parameter server and thousands of (typically CPU) workers may be fundamentally incompatible with the needs of modern computer vision and robotics communities," the researchers explained in their post, Near-perfect point-goal navigation from 2.5 billion frames of experience. "Unlike Gym or Atari, 3D simulators require GPU acceleration…. The desired agents operate from high-dimensional inputs (pixels) and use deep networks, such as ResNet50, which strain the parameter server. Thus, existing distributed RL architectures do not scale and there is a need to develop a new distributed architecture."

near-perfect point-goal navigation, nvidia developer news center, point-goal navigation, (7 more...)

Country: Africa > Ethiopia (0.06)

Industry:

Information Technology > Hardware (0.43)
Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

#artificialintelligenceJan-28-2020, 00:06:26 GMT

Python For Network Engineers Bootcamp

Link: Python For Network Engineers Bootcamp Get udemy course code Real-Life Hands-On Python Automation: Netmiko, Paramiko, Napalm, Nornir, GNS3,Telnet, SSH, Cisco, Arista, Linux etc Network Automation or Network Programming using Python and have the desire New What you'll learn You will MASTER all the Python 3 key concepts starting from Scratch. No prior Python or programming knowledge is required Learn network programmability with Python See real-world examples of automation scripts with Python for Cisco IOS, Arista EOS or Linux Learn how to use and improve Paramiko and Netmiko for automation of common administration tasks with Python Learn how to configure networking devices with Python You will learn in-depth general Python Programming Use NAPALM Python library in a Multivendor Environment Understand how to use Telnet and SSH with Python for network automation Learn how to automate the configuration of networking devices with Python 3 in a Multivendor Environment Description ***Fully updated for 2020*** This Network Automation with Python course also covers every major General Python Programming topic and is a perfect match for both beginners and experienced developers! Welcome to this Python hands-on course for learning Network Automation and Programmability with Python in a Cisco or Multivendor Environment. Boost your Python Network Programming Skills by learning one of the hottest topic in the Networking Industry in 2019 and become one of the best Network Engineer! This course is based on Python 3 and doesn't require prior Python Programming knowledge.

networking device, python, reinforcement, (11 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education (0.82)
Leisure & Entertainment > Games (0.55)
Information Technology (0.52)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

#artificialintelligenceJan-28-2020, 00:06:26 GMT

Python For Network Engineers Bootcamp

Link: Python For Network Engineers Bootcamp Get udemy course code Real-Life Hands-On Python Automation: Netmiko, Paramiko, Napalm, Nornir, GNS3,Telnet, SSH, Cisco, Arista, Linux etc Network Automation or Network Programming using Python and have the desire New What you'll learn You will MASTER all the Python 3 key concepts starting from Scratch. No prior Python or programming knowledge is required Learn network programmability with Python See real-world examples of automation scripts with Python for Cisco IOS, Arista EOS or Linux Learn how to use and improve Paramiko and Netmiko for automation of common administration tasks with Python Learn how to configure networking devices with Python You will learn in-depth general Python Programming Use NAPALM Python library in a Multivendor Environment Understand how to use Telnet and SSH with Python for network automation Learn how to automate the configuration of networking devices with Python 3 in a Multivendor Environment Description ***Fully updated for 2020*** This Network Automation with Python course also covers every major General Python Programming topic and is a perfect match for both beginners and experienced developers! Welcome to this Python hands-on course for learning Network Automation and Programmability with Python in a Cisco or Multivendor Environment. Boost your Python Network Programming Skills by learning one of the hottest topic in the Networking Industry in 2019 and become one of the best Network Engineer! This course is based on Python 3 and doesn't require prior Python Programming knowledge.

networking device, python, reinforcement, (11 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education (0.82)
Leisure & Entertainment > Games (0.55)
Information Technology (0.52)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

arXiv.org Artificial IntelligenceJan-28-2020

Artificial Intelligence Aided Next-Generation Networks Relying on UAVs

Liu, Xiao, Chen, Mingzhe, Liu, Yuanwei, Chen, Yue, Cui, Shuguang, Hanzo, Lajos

Artificial intelligence (AI) assisted unmanned aerial vehicle (UAV) aided next-generation networking is proposed for dynamic environments. In the AI-enabled UAV-aided wireless networks (UAWN), multiple UAVs are employed as aerial base stations, which are capable of rapidly adapting to the dynamic environment by collecting information about the users' position and tele-traffic demands, learning from the environment and acting upon the feedback received from the users. Moreover, AI enables the interaction amongst a swarm of UAVs for cooperative optimization of the system. As a benefit of the AI framework, several challenges of conventional UAWN may be circumvented, leading to enhanced network performance, improved reliability and agile adaptivity. As a further benefit, dynamic trajectory design and resource allocation are demonstrated. Finally, potential research challenges and opportunities are discussed.

information, resource allocation, ua wn, (14 more...)

2001.11958

Genre: Research Report (0.50)

Industry:

Telecommunications (1.00)
Energy (0.93)
Information Technology > Services (0.47)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Madumal, Prashan, Miller, Tim, Sonenberg, Liz, Vetere, Frank

Distal Explanations for Explainable Reinforcement Learning Agents

arXiv.org Artificial IntelligenceJan-28-2020

Causal explanations present an intuitive way to understand the course of events through causal chains, and are widely accepted in cognitive science as the prominent model humans use for explanation. Importantly, causal models can generate opportunity chains, which take the form of `A enables B and B causes C'. We ground the notion of opportunity chains in human-agent experimental data, where we present participants with explanations from different models and ask them to provide their own explanations for agent behaviour. Results indicate that humans do in-fact use the concept of opportunity chains frequently for describing artificial agent behaviour. Recently, action influence models have been proposed to provide causal explanations for model-free reinforcement learning (RL). While these models can generate counterfactuals---things that did not happen but could have under different conditions---they lack the ability to generate explanations of opportunity chains. We introduce a distal explanation model that can analyse counterfactuals and opportunity chains using decision trees and causal models. We employ a recurrent neural network to learn opportunity chains and make use of decision trees to improve the accuracy of task prediction and the generated counterfactuals. We computationally evaluate the model in 6 RL benchmarks using different RL algorithms, and show that our model performs better in task prediction. We report on a study with 90 participants who receive explanations of RL agents behaviour in solving three scenarios: 1) Adversarial; 2) Search and rescue; and 3) Human-Agent collaborative scenarios. We investigate the participants' understanding of the agent through task prediction and their subjective satisfaction of the explanations and show that our distal explanation model results in improved outcomes over the three scenarios compared with two baseline explanation models.

agent, explanation, participant, (14 more...)

2001.10284

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Games (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Artificial IntelligenceJan-28-2020

Towards Learning Multi-agent Negotiations via Self-Play

Tang, Yichuan Charlie

Making sophisticated, robust, and safe sequential decisions is at the heart of intelligent systems. This is especially critical for planning in complex multi-agent environments, where agents need to anticipate other agents' intentions and possible future actions. Traditional methods formulate the problem as a Markov Decision Process, but the solutions often rely on various assumptions and become brittle when presented with corner cases. In contrast, deep reinforcement learning (Deep RL) has been very effective at finding policies by simultaneously exploring, interacting, and learning from environments. Leveraging the powerful Deep RL paradigm, we demonstrate that an iterative procedure of self-play can create progressively more diverse environments, leading to the learning of sophisticated and robust multi-agent policies. W e demonstrate this in a challenging multi-agent simulation of merging traffic, where agents must interact and negotiate with others in order to successfully merge on or off the road. While the environment starts off simple, we increase its complexity by iteratively adding an increasingly diverse set of agents to the agent "zoo" as training progresses. Qualitatively, we find that through self-play, our policies automatically learn interesting behaviors such as defensive driving, overtaking, yielding, and the use of signal lights to communicate intentions to other agents. In addition, quantitatively, we show a dramatic improvement of the success rate of merging maneuvers from 63% to over 98%.

agent, idm agent, vehicle, (15 more...)