AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic B\"uchi Automata

Oura, Ryohei, Sakakibara, Ami, Ushio, Toshimitsu

arXiv.org Artificial IntelligenceJan-14-2020

This letter proposes a novel reinforcement learning method for the synthesis of a control policy satisfying a control specification described by a linear temporal logic formula. We assume that the controlled system is modeled by a Markov decision process (MDP). We transform the specification to a limit-deterministic B\"uchi automaton (LDBA) with several accepting sets that accepts all infinite sequences satisfying the formula. The LDBA is augmented so that it explicitly records the previous visits to accepting sets. We take a product of the augmented LDBA and the MDP, based on which we define a reward function. The agent gets rewards whenever state transitions are in an accepting set that has not been visited for a certain number of steps. Consequently, sparsity of rewards is relaxed and optimal circulations among the accepting sets are learned. We show that the proposed method can learn an optimal policy when the discount factor is sufficiently close to one.

init, specification, transition, (14 more...)

arXiv.org Artificial Intelligence

2001.04669

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Reinforcement Learning for the Enterprise - DZone AI

#artificialintelligenceJan-13-2020, 23:29:13 GMT

This article is featured in the new DZone Guide to Artificial Intelligence. Get your free copy for more insightful articles, industry statistics, and more! Humanity has a unique ability to adapt to dynamic environments and learn from their surroundings and failures. It is something that machines lack, and that is where artificial intelligence seeks to correct this deficiency. However, traditional supervised machine learning techniques require a lot of proper historical data to learn patterns and then act based on them.

agent, algorithm, reinforcement, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning and Its Implications for Enterprise Artificial Intelligence

#artificialintelligenceJan-13-2020, 23:29:06 GMT

Deep RL is where deep learning is used in conjunction with RL to simplify the reward function in cases where the search space is very large, or the environment is very complicated with multi-dimensional states, actions, and rewards. The use of deep learning with RL is also known as Q-learning in which a deep learning network is used as a function approximator (called the Q function), predicting the reward for an input, rather than trying to explore and store rewards and actions for every state. Also, in simulation environments, by simply feeding pixels of an environment through a neural network, it allows the reinforcement algorithm to better understand its environment. For the most part, RL is being used to teach AI systems how to play games, as games provide a safe and bounded environment for learning. For example, AlphaGo uses RL (in combination with other techniques) and similar techniques to have AI learn Atari games, or become champions at Poker.

algorithm, openai, reinforcement learning, (14 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Answers Enterprise Reinforcement Learning Challenges

#artificialintelligenceJan-13-2020, 23:28:55 GMT

The applicability of RL in the enterprise is vast and largely untapped. To date, most Deep Reinforcement Learning successes have focused on its application to games and robotics. In such cases, emulators and simulators are readily available and present the perfect environment in which to run trials without risk. By contrast, many of the problems that companies wish to solve do not come with a risk-free testing environment: It can be difficult and sometimes impossible to allow an AI agent to freely and rapidly explore the impact of its potential actions through trial and error. But the availability of a simulator is not essential to effectively applying RL techniques in enterprise settings.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Data Answers Enterprise Reinforcement Learning Challenges

#artificialintelligenceJan-13-2020, 23:28:46 GMT

agent, answer enterprise reinforcement learning challenge, behavior policy, (5 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Reinforcement Learning visualised with a predator prey ball game

#artificialintelligenceJan-13-2020, 14:34:36 GMT

This is a follow up to a previous article, where we looked at a simple Reinforcement Learning (RL) game in which a green ball learnt to reach a small circle at the centre of a canvas within 200 steps. We wrote a Q-learning algorithm and visualised it using a Tkinter based GUI. We will now give the green ball a slightly more complicated challenge. This time the aim is to learn to reach the centre within 200 steps as well but now there is another ball, a red ball, which the green ball must avoid. The red ball starts near the circle and moves randomly.

green ball, red ball, reinforcement learning, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

When Humans Aren't Optimal: Robots that Collaborate with Risk-Aware Humans

Kwon, Minae, Biyik, Erdem, Talati, Aditi, Bhasin, Karan, Losey, Dylan P., Sadigh, Dorsa

arXiv.org Artificial IntelligenceJan-13-2020

In order to collaborate safely and efficiently, robots need to anticipate how their human partners will behave. Some of today's robots model humans as if they were also robots, and assume users are always optimal. Other robots account for human limitations, and relax this assumption so that the human is noisily rational. Both of these models make sense when the human receives deterministic rewards: i.e., gaining either $100 or $130 with certainty. But in real world scenarios, rewards are rarely deterministic. Instead, we must make choices subject to risk and uncertainty--and in these settings, humans exhibit a cognitive bias towards suboptimal behavior. For example, when deciding between gaining $100 with certainty or $130 only 80% of the time, people tend to make the risk-averse choice--even though it leads to a lower expected gain! In this paper, we adopt a well-known Risk-Aware human model from behavioral economics called Cumulative Prospect Theory and enable robots to leverage this model during human-robot interaction (HRI). In our user studies, we offer supporting evidence that the Risk-Aware model more accurately predicts suboptimal human behavior. We find that this increased modeling accuracy results in safer and more efficient human-robot collaboration. Overall, we extend existing rational human models so that collaborative robots can anticipate and plan around suboptimal human behavior during HRI.

machine learning, reinforcement learning, simulation of human behavior, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3319502.3374832

2001.04377

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (0.90)

Industry:

Transportation > Ground > Road (0.50)
Automobiles & Trucks (0.32)
Information Technology > Robotics & Automation (0.32)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings

Shi, C., Zhang, S., Lu, W., Song, R.

arXiv.org Machine LearningJan-13-2020

Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision making problems. The goodness of a policy is measured by its value function starting from some initial state. The focus of this paper is to construct confidence intervals (CIs) for a policy's value in infinite horizon settings where the number of decision points diverges to infinity. We propose to model the action-value state function (Q-function) associated with a policy based on series/sieve method to derive its confidence interval. When the target policy depends on the observed data as well, we propose a SequentiAl Value Evaluation (SAVE) method to recursively update the estimated policy and its value estimator. As long as either the number of trajectories or the number of decision points diverges to infinity, we show that the proposed CI achieves nominal coverage even in cases where the optimal policy is not unique. Simulation studies are conducted to back up our theoretical findings. We apply the proposed method to a dataset from mobile health studies and find that reinforcement learning algorithms could help improve patient's health status.

inference, optimal policy, similar argument, (16 more...)

arXiv.org Machine Learning

2001.04515

Country:

North America > United States > New York (0.04)
North America > United States > North Carolina (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Exploiting Language Instructions for Interpretable and Compositional Reinforcement Learning

van der Meer, Michiel, Pirotta, Matteo, Bruni, Elia

arXiv.org Artificial IntelligenceJan-13-2020

In this work, we present an alternative approach to making an agent compositional through the use of a diagnostic classifier. Because of the need for explainable agents in automated decision processes, we attempt to interpret the latent space from an RL agent to identify its current objective in a complex language instruction. Results show that the classification process causes changes in the hidden states which makes them more easily interpretable, but also causes a shift in zero-shot performance to novel instructions. Lastly, we limit the supervisory signal on the classification, and observe a similar but less notable effect.

agent, classifier, instruction, (13 more...)

arXiv.org Artificial Intelligence

2001.04418

Country:

North America > United States > Massachusetts (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.68)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Poisoning in Batch Reinforcement Learning and Control

Ma, Yuzhe, Zhang, Xuezhou, Sun, Wen, Zhu, Jerry

Neural Information Processing SystemsJan-12-2020, 05:06:01 GMT

We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy. The victim is a reinforcement learner / controller which first estimates the dynamics and the rewards from a batch data set, and then solves for the optimal policy with respect to the estimates. The attacker can modify the data set slightly before learning happens, and wants to force the learner into learning a target policy chosen by the attacker. We present a unified framework for solving batch policy poisoning attacks, and instantiate the attack on two standard victims: tabular certainty equivalence learner in reinforcement learning and linear quadratic regulator in control. We show that both instantiation result in a convex optimization problem on which global optimality is guaranteed, and provide analysis on attack feasibility and attack cost.

batch reinforcement learning and control, policy poisoning, policy poisoning attack, (3 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback