AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Hands-On Amazon Redshift for Data Warehousing [Video]

#artificialintelligenceOct-13-2019, 10:54:10 GMT

Colibri Digital is a technology consultancy company founded in 2015 by James Cross and Ingrid Funie. The company works to help its clients navigate the rapidly changing and complex world of emerging technologies, with deep expertise in areas such as big data, data science, Machine Learning, and cloud computing. Over the past few years, they have worked with some of the World's largest and most prestigious companies, including a tier 1 investment bank, a leading management consultancy group, and one of the World's most popular soft drinks companies, helping each of them to better make sense of its data, and process it in more intelligent ways. Jim DiLorenzo is a freelance programmer and reinforcement learning enthusiast. He graduated from Columbia University and is working on his Masters in Computer Science.

hand-on amazon redshift

#artificialintelligence

Technology:

Information Technology > Virtualization (0.85)
Information Technology > Cloud Computing (0.75)
Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

Ways AI projects are changing your life right now, in 2018

#artificialintelligenceOct-13-2019, 00:33:51 GMT

Imagine: in 2001 Steven Spielberg released his science fiction movie called "Artificial Intelligence". Artificial intelligence programming is one of the hottest topics in the tech world today, and many influencers, from late, great Stephen Hawking to increasingly popular Elon Musk, both embrace the achievements of AI projects and warn us about the possible implications. So how does this new technology influence the world around us? Should you be worried that some AI robot will steal your job any time soon? Both academic and industrial researchers have put a lot of effort into creating adaptable smart machines for all sorts of industrial processes. Many startups have caught the trend and are beginning to develop reinforcement learning algorithms for industrial robotics.

ai project, algorithm, reinforcement, (7 more...)

#artificialintelligence

Country: North America > United States > New York > New York County > New York City (0.05)

Genre: Personal (0.31)

Industry:

Health & Medicine (1.00)
Media > Film (0.56)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

On the Utility of Learning about Humans for Human-AI Coordination

Carroll, Micah, Shah, Rohin, Ho, Mark K., Griffiths, Thomas L., Seshia, Sanjit A., Abbeel, Pieter, Dragan, Anca

arXiv.org Artificial IntelligenceOct-13-2019

While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves. Agents that assume their partner to be optimal or similar to them can converge to coordination protocols that fail to understand and be understood by humans. To demonstrate this, we introduce a simple environment that requires challenging coordination, based on the popular game Overcooked, and learn a simple model that mimics human play. We evaluate the performance of agents trained via self-play and population-based training. These agents perform very well when paired with themselves, but when paired with our human model, they are significantly worse than agents designed to play with the human model. An experiment with a planning algorithm yields the same conclusion, though only when the human-aware planner is given the exact human model that it is playing with. A user study with real humans shows this pattern as well, though less strongly. Qualitatively, we find that the gains come from having the agent adapt to the human's gameplay. Given this result, we suggest several approaches for designing agents that learn about humans in order to better coordinate with them. Code is available at https://github.com/HumanCompatibleAI/overcooked_ai.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1910.05789

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Policy Poisoning in Batch Reinforcement Learning and Control

Ma, Yuzhe, Zhang, Xuezhou, Sun, Wen, Zhu, Xiaojin

arXiv.org Machine LearningOct-13-2019

We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy. The victim is a reinforcement learner / controller which first estimates the dynamics and the rewards from a batch data set, and then solves for the optimal policy with respect to the estimates. The attacker can modify the data set slightly before learning happens, and wants to force the learner into learning a target policy chosen by the attacker. We present a unified framework for solving batch policy poisoning attacks, and instantiate the attack on two standard victims: tabular certainty equivalence learner in reinforcement learning and linear quadratic regulator in control. We show that both instantiation result in a convex optimization problem on which global optimality is guaranteed, and provide analysis on attack feasibility and attack cost. Experiments show the effectiveness of policy poisoning attacks.

machine learning, optimization, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1910.05821

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Actor Critic with Differentially Private Critic

Lebensold, Jonathan, Hamilton, William, Balle, Borja, Precup, Doina

arXiv.org Machine LearningOct-13-2019

Reinforcement learning algorithms are known to be sample inefficient, and often performance on one task can be substantially improved by leveraging information (e.g., via pre-training) on other related tasks. In this work, we propose a technique to achieve such knowledge transfer in cases where agent trajectories contain sensitive or private information, such as in the healthcare domain. Our approach leverages a differentially private policy evaluation algorithm to initialize an actor-critic model and improve the effectiveness of learning in downstream tasks. We empirically show this technique increases sample efficiency in resource-constrained control problems while preserving the privacy of trajectories collected in an upstream task.

algorithm, privacy, value function, (14 more...)

arXiv.org Machine Learning

1910.05876

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.05)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Regularizing Model-Based Planning with Energy-Based Models

Boney, Rinu, Kannala, Juho, Ilin, Alexander

arXiv.org Machine LearningOct-12-2019

Model-based reinforcement learning could enable sample-efficient learning by quickly acquiring rich knowledge about the world and using it to improve behaviour without additional data. Learned dynamics models can be directly used for planning actions but this has been challenging because of inaccuracies in the learned models. In this paper, we focus on planning with learned dynamics models and propose to regularize it using energy estimates of state transitions in the environment. We visually demonstrate the effectiveness of the proposed method and show that off-policy training of an energy estimator can be effectively used to regularize planning with pre-trained dynamics models. Further, we demonstrate that the proposed method enables sample-efficient learning to achieve competitive performance in challenging continuous control tasks such as Half-cheetah and Ant in just a few minutes of experience.

artificial intelligence, dynamic model, neural network, (17 more...)

arXiv.org Machine Learning

1910.05527

Country: Asia > Japan (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Influence-Based Multi-Agent Exploration

Wang, Tonghan, Wang, Jianhao, Wu, Yi, Zhang, Chongjie

arXiv.org Machine LearningOct-12-2019

A BSTRACT Intrinsically motivated reinforcement learning aims to address the exploration challenge for sparse-reward tasks. However, the study of exploration methods in transition-dependent multi-agent settings is largely absent from the literature. We aim to take a step towards solving this problem. We present two exploration methods: exploration via information-theoretic influence (EITI) and exploration via decision-theoretic influence (EDTI), by exploiting the role of interaction in coordinated behaviors of agents. EITI uses mutual information to capture influence transition dynamics. EDTI uses a novel intrinsic reward, called V alue of Interaction (V oI), to characterize and quantify the influence of one agent's behavior on expected returns of other agents. By optimizing EITI or EDTI objective as a regularizer, agents are encouraged to coordinate their exploration and learn policies to optimize team performance. We show how to optimize these regularizers so that they can be easily integrated with policy gradient reinforcement learning. The resulting update rule draws a connection between coordinated exploration and intrinsic reward distribution. Finally, we empirically demonstrate the significant strength of our method in a variety of multi-agent scenarios. Many advances of deep reinforcement learning rely on a dense shaped reward function, such as distance to the goal (Mirowski et al., 2016; Wu et al., 2018), scores in games (Mnih et al., 2015) or expert-designed rewards (Wu & Tian, 2016; OpenAI, 2018), while tend to struggle in many real-world scenarios with sparse rewards.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Machine Learning

1910.05512

Country:

Asia > Middle East > Jordan (0.04)
North America > United States (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback

Autonomous Navigation via Deep Reinforcement Learning for Resource Constraint Edge Nodes using Transfer Learning

Anwar, Aqeel, Raychowdhury, Arijit

arXiv.org Machine LearningOct-12-2019

--Smart and agile drones are fast becoming ubiquitous at the edge of the cloud. The usage of these drones are constrained by their limited power and compute capability. In this paper, we present a Transfer Learning (TL) based approach to reduce on-board computation required to train a deep neural network for autonomous navigation via Deep Reinforcement Learning for a target algorithmic performance. A library of 3D realistic meta-environments is manually designed using Unreal Gaming Engine and the network is trained end-to- end. These trained meta-weights are then used as initializers to the network in a test environment and fine-tuned for the last few fully connected layers. V ariation in drone dynamics and environmental characteristics is carried out to show robustness of the approach. Using NVIDIA GPU profiler it was shown that the energy consumption and training latency is reduced by 3.7x and 1.8x respectively without significant degradation in the performance in terms of average distance traveled before crash i.e. The approach is also tested on a real environment using DJI T ello drone and similar results were reported. The video of the drone with proposed approach will be uploaded to Y ouTube. VER the past decade, Unmanned aerial vehicle (UA V) are emerging as a new form of IoT devices being used in varied applications such as reconnaissance, surveying, rescuing and mapping. Irrespective of the application, navigating autonomously is one of the key desirable features of UA Vs both indoors and outdoors.

action space, learning, train type, (14 more...)

arXiv.org Machine Learning

1910.05547

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.40)

Industry:

Aerospace & Defense > Aircraft (0.48)
Education > Educational Setting > Online (0.46)
Leisure & Entertainment > Games (0.34)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

MVFST-RL: An Asynchronous RL Framework for Congestion Control with Delayed Actions

Sivakumar, Viswanath, Rocktäschel, Tim, Miller, Alexander H., Küttler, Heinrich, Nardelli, Nantas, Rabbat, Mike, Pineau, Joelle, Riedel, Sebastian

arXiv.org Machine LearningOct-12-2019

Effective network congestion control strategies are key to keeping the Internet (or any large computer network) operational. Network congestion control has been dominated by hand-crafted heuristics for decades. Recently, ReinforcementLearning (RL) has emerged as an alternative to automatically optimize such control strategies. Research so far has primarily considered RL interfaces which block the sender while an agent considers its next action. This is largely an artifact of building on top of frameworks designed for RL in games (e.g. OpenAI Gym). However, this does not translate to real-world networking environments, where a network sender waiting on a policy without sending data is costly for throughput. We instead propose to formulate congestion control with an asynchronous RL agent that handles delayed actions. We present MVFST-RL, a scalable framework for congestion control in the QUIC transport protocol that leverages state-of-the-art in asynchronous RL training with off-policy correction. We analyze modeling improvements to mitigate the deviation from Markovian dynamics, and evaluate our method on emulated networks from the Pantheon benchmark platform. The source code is publicly available at https://github.com/facebookresearch/mvfst-rl.

action space, agent, congestion control, (13 more...)

arXiv.org Machine Learning

1910.04054

Country:

Asia > Nepal (0.05)
Asia > India (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry:

Information Technology (0.69)
Telecommunications > Networks (0.55)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Extracting Incentives from Black-Box Decisions

Shavit, Yonadav, Moses, William S.

arXiv.org Artificial IntelligenceOct-12-2019

An algorithmic decision-maker incentivizes people to act in certain ways to receive better decisions. These incentives can dramatically influence subjects' behaviors and lives, and it is important that both decision-makers and decision-recipients have clarity on which actions are incentivized by the chosen model. While for linear functions, the changes a subject is incentivized to make may be clear, we prove that for many non-linear functions (e.g. neural networks, random forests), classical methods for interpreting the behavior of models (e.g. input gradients) provide poor advice to individuals on which actions they should take. In this work, we propose a mathematical framework for understanding algorithmic incentives as the challenge of solving a Markov Decision Process, where the state includes the set of input features, and the reward is a function of the model's output. We can then leverage the many toolkits for solving MDPs (e.g. tree-based planning, reinforcement learning) to identify the optimal actions each individual is incentivized to take to improve their decision under a given model. We demonstrate the utility of our method by estimating the maximally-incentivized actions in two real-world settings: a recidivism risk predictor we train using ProPublica's COMPAS dataset, and an online credit scoring tool published by the Fair Isaac Corporation (FICO).

advice policy, decision function, incentive, (13 more...)

arXiv.org Artificial Intelligence

1910.05664

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Industry:

Law (1.00)
Banking & Finance > Credit (0.69)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
Transportation > Air (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback