AITopics

2002.04523

Country:

North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Kallus, Nathan, Zhou, Angela

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

arXiv.org Machine LearningFeb-11-2020

Off-policy evaluation of sequential decision policies from observational data is necessary in applications of batch reinforcement learning such as education and healthcare. In such settings, however, observed actions are often confounded with transitions by unobserved variables, rendering exact evaluation of new policies impossible, i.e., unidentifiable. We develop a robust approach that estimates sharp bounds on the (unidentifiable) value of a given policy in an infinite-horizon problem given data from another policy with unobserved confounding subject to a sensitivity model. We phrase the problem precisely as computing the support function of the set of all stationary state-occupancy ratios that agree with both the data and the sensitivity model. We show how to express this set using a new partially identified estimating equation and prove convergence to the sharp bounds, as we collect more confounded data. We prove that membership in the set can be checked by solving a linear program, while the support function is given by a difficult nonconvex optimization problem. We leverage an analytical solution for the finite-state-space case to develop approximations based on nonconvex projected gradient descent. We demonstrate the resulting bounds empirically.

confounder, constraint, evaluation, (13 more...)

2002.04518

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Meresht, Vahid Balazadeh, De, Abir, Singla, Adish, Gomez-Rodriguez, Manuel

Learning to Switch Between Machines and Humans

arXiv.org Machine LearningFeb-11-2020

Reinforcement learning algorithms have been mostly developed and evaluated under the assumption that they will operate in a fully autonomous manner---they will take all actions. However, in safety critical applications, full autonomy faces a variety of technical, societal and legal challenges, which have precluded the use of reinforcement learning policies in real-world systems. In this work, our goal is to develop algorithms that, by learning to switch control between machines and humans, allow existing reinforcement learning policies to operate under different automation levels. More specifically, we first formally define the learning to switch problem using finite horizon Markov decision processes. Then, we show that, if the human policy is known, we can find the optimal switching policy directly by solving a set of recursive equations using backwards induction. However, in practice, the human policy is often unknown. To overcome this, we develop an algorithm that uses upper confidence bounds on the human policy to find a sequence of switching policies whose total regret with respect to the optimal switching policy is sublinear. Simulation experiments on two important tasks in autonomous driving---lane keeping and obstacle avoidance---demonstrate the effectiveness of the proposed algorithms and illustrate our theoretical findings.

algorithm, human policy, optimal, (16 more...)

2002.04258

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Artificial IntelligenceFeb-11-2020

Hyper-Meta Reinforcement Learning with Sparse Reward

Hua, Yun, Wang, Xiangfeng, Jin, Bo, Li, Wenhao, Yan, Junchi, He, Xiaofeng, Zha, Hongyuan

Despite their success, existing meta reinforcement learning methods still have difficulty in learning a meta policy effectively for RL problems with sparse reward. To this end, we develop a novel meta reinforcement learning framework, Hyper-Meta RL (HMRL), for sparse reward RL problems. It consists of meta state embedding, meta reward shaping and meta policy learning modules: The cross-environment meta state embedding module constructs a common meta state space to adapt to different environments; The meta state based environment-specific meta reward shaping effectively extends the original sparse reward trajectory by cross-environmental knowledge complementarity; As a consequence, the meta policy then achieves better generalization and efficiency with the shaped meta reward. Experiments with sparse reward show the superiority of HMRL on both transferability and policy learning efficiency.

meta reward, meta state, sparse reward, (14 more...)

arXiv.org Artificial Intelligence

2002.04238

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceFeb-10-2020, 11:29:04 GMT

Ai training-course-content

artificial intelligence fundamental application, artificial intelligence introduction difference, q-learning intuition plan, (12 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.90)

Industry: Education (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

#artificialintelligenceFeb-10-2020, 07:12:12 GMT

Deep Reinforcement Learning With TensorFlow 2.1 Roman Ring

In this tutorial, I will give an overview of the TensorFlow 2.x features through the lens of deep reinforcement learning (DRL) by implementing an advantage actor-critic (A2C) agent, solving the classic CartPole-v0 environment. While the goal is to showcase TensorFlow 2.x, I will do my best to make DRL approachable as well, including a birds-eye overview of the field. In fact, since the main focus of the 2.x release is making life easier for the developers, it's a great time to get into DRL with TensorFlow. For example, the source code for this blog post is under 150 lines, including comments! Code is available on GitHub here and as a notebook on Google Colab here.

algorithm, eager mode, tensorflow 2, (13 more...)

Genre: Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceFeb-10-2020, 06:59:27 GMT

Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI

Learning and motivation are driven by internal and external rewards. Many of our day-to-day behaviours are guided by predicting, or anticipating, whether a given action will result in a positive (that is, rewarding) outcome. The study of how organisms learn from experience to correctly anticipate rewards has been a productive research field for well over a century, since Ivan Pavlov's seminal psychological work. In his most famous experiment, dogs were trained to expect food some time after a buzzer sounded. These dogs began salivating as soon as they heard the sound, before the food had arrived, indicating they'd learned to predict the reward.

dopamine and temporal difference, fruitful relationship, neuroscience and ai, (3 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

#artificialintelligenceFeb-10-2020, 03:38:19 GMT

Mastering Machine Learning Algorithms: Expert techniques for implementing popular machine learning algorithms, fine-tuning your models, and understanding how they work, 2nd Edition: Giuseppe Bonaccorso: 9781838820299: Amazon.com: Books

Giuseppe Bonaccorso is Head of Data Science in a large multinational company. He received his M.Sc.Eng. in Electronics in 2005 from University of Catania, Italy, and continued his studies at University of Rome Tor Vergata, and University of Essex, UK. His main interests include machine/deep learning, reinforcement learning, big data, and bio-inspired adaptive systems. He is author of several publications including Machine Learning Algorithms and Hands-On Unsupervised Learning with Python, published by Packt.

giuseppe bonaccorso, machine learning algorithm, mastering machine learning algorithm, (5 more...)

Country:

Europe > United Kingdom > England > Essex (0.37)
Europe > Italy (0.37)

Industry: Retail > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Hao, Botao, Lazic, Nevena, Abbasi-Yadkori, Yasin, Joulani, Pooria, Szepesvari, Csaba

Provably Efficient Adaptive Approximate Policy Iteration

arXiv.org Machine LearningFeb-10-2020

Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains, including games and robotics. However, the theoretical understanding of such algorithms is limited, and existing results are largely focused on episodic or discounted Markov decision processes (MDPs). In this work, we present adaptive approximate policy iteration (AAPI), a learning scheme which enjoys a O(T^{2/3}) regret bound for undiscounted, continuing learning in uniformly ergodic MDPs. This is an improvement over the best existing bound of O(T^{3/4}) for the average-reward case with function approximation. Our algorithm and analysis rely on adversarial online learning techniques, where value functions are treated as losses. The main technical novelty is the use of a data-dependent adaptive learning rate coupled with a so-called optimistic prediction of upcoming losses. In addition to theoretical guarantees, we demonstrate the advantages of our approach empirically on several environments.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2002.03069

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

arXiv.org Artificial IntelligenceFeb-10-2020

Proficiency Aware Multi-Agent Actor-Critic for Mixed Aerial and Ground Robot Teaming

Yu, Qifei, Shen, Zhexin, Pang, Yijiang, Liu, Rui

Mixed Cooperation and competition are the actual scenarios of deploying multi-robot systems, such as the multi-UAV/UGV teaming for tracking criminal vehicles and protecting important individuals. Types and the total number of robot are all important factors that influence mixed cooperation quality. In various real-world environments, such as open space, forest, and urban building clusters, robot deployments have been influenced largely, as different robots have different configurations to support different environments. For example, UGVs are good at moving on the urban roads and reach the forest area while UAVs are good at flying in open space and around the high building clusters. However, it is challenging to design the collective behaviors for robot cooperation according to the dynamic changes in robot capabilities, working status, and environmental constraints. To solve this question, we proposed a novel proficiency-aware mixed environment multi-agent deep reinforcement learning (Mix-DRL). In Mix-DRL, robot capability and environment factors are formalized into the model to update the policy to model the nonlinear relations between heterogeneous team deployment strategies and the real-world environmental conditions. Mix-DRL can largely exploit robot capability while staying aware of the environment limitations. With the validation of a heterogeneous team with 2 UAVs and 2 UGVs in tasks, such as social security for criminal vehicle tracking, the Mix-DRL's effectiveness has been evaluated with $14.20\%$ of cooperation improvement. Given the general setting of Mix-DRL, it can be used to guide the general cooperation of UAVs and UGVs for multi-target tracking.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2002.0391

Country:

North America > United States > Ohio > Portage County > Kent (0.04)
North America > United States > New York (0.04)
North America > United States > Maryland > Baltimore (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.49)
Government (0.48)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.34)