AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

{\epsilon}-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

Gimelfarb, Michael, Sanner, Scott, Lee, Chi-Guhn

arXiv.org Machine LearningJul-2-2020

Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which despite its simplicity, remains one of the most frequently used forms of exploration. However, a key limitation of this policy is the specification of $\varepsilon$. In this paper, we provide a novel Bayesian perspective of $\varepsilon$ as a measure of the uniformity of the Q-value function. We introduce a closed-form Bayesian model update based on Bayesian model combination (BMC), based on this new perspective, which allows us to adapt $\varepsilon$ using experiences from the environment in constant time with monotone convergence guarantees. We demonstrate that our proposed algorithm, $\varepsilon$-\texttt{BMC}, efficiently balances exploration and exploitation on different problems, performing comparably or outperforming the best tuned fixed annealing schedules and an alternative data-dependent $\varepsilon$ adaptation scheme proposed in the literature.

bayesian inference, upstream oil & gas, vdbe, (19 more...)

arXiv.org Machine Learning

2007.00869

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Verifiably Safe Exploration for End-to-End Reinforcement Learning

Hunt, Nathan, Fulton, Nathan, Magliacane, Sara, Hoang, Nghia, Das, Subhro, Solar-Lezama, Armando

arXiv.org Artificial IntelligenceJul-2-2020

Deep reinforcement learning algorithms (Sutton & Barto, 1998) are effective at learning, often from raw sensor inputs, control policies that optimize for a quantitative reward signal. Learning these policies can require experiencing millions of unsafe actions. Even if a safe policy is finally learned - which will happen only if the reward signal reflects all relevant safety priorities - providing a purely statistical guarantee that the optimal policy is safe requires an unrealistic amount of training data (Kalra & Paddock, 2016). The difficulty of establishing the safety of these algorithms makes it difficult to justify the use of reinforcement learning in safety-critical domains where industry standards demand strong evidence of safety prior to deployment (ISO-26262, 2011). Formal verification provides a rigorous way of establishing safety for traditional control systems (Clarke et al., 2018). The problem of providing formal guarantees in RL is called formally constrained reinforcement learning (FCRL).

machine learning, reinforcement learning, verifiably safe exploration, (11 more...)

arXiv.org Artificial Intelligence

2007.01223

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints

Andriotis, C. P., Papakonstantinou, K. G.

arXiv.org Artificial IntelligenceJul-2-2020

Determination of inspection and maintenance policies for minimizing long-term risks and costs in deteriorating engineering environments constitutes a complex optimization problem. Major computational challenges include the (i) curse of dimensionality, due to exponential scaling of state/action set cardinalities with the number of components; (ii) curse of history, related to exponentially growing decision-trees with the number of decision-steps; (iii) presence of state uncertainties, induced by inherent environment stochasticity and variability of inspection/monitoring measurements; (iv) presence of constraints, pertaining to stochastic long-term limitations, due to resource scarcity and other infeasible/undesirable system responses. In this work, these challenges are addressed within a joint framework of constrained Partially Observable Markov Decision Processes (POMDP) and multi-agent Deep Reinforcement Learning (DRL). POMDPs optimally tackle (ii)-(iii), combining stochastic dynamic programming with Bayesian inference principles. Multi-agent DRL addresses (i), through deep function parametrizations and decentralized control assumptions. Challenge (iv) is herein handled through proper state augmentation and Lagrangian relaxation, with emphasis on life-cycle risk-based constraints and budget limitations. The underlying algorithmic steps are provided, and the proposed framework is found to outperform well-established policy baselines and facilitate adept prescription of inspection and intervention actions, in cases where decisions must be made in the most resource- and risk-aware manner.

constraint, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2007.0138

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(4 more...)

Genre: Research Report (0.63)

Industry: Energy > Renewable > Wind (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Reinforcement Learning Generalization with Surprise Minimization

Chen, Jerry Zikun

arXiv.org Artificial IntelligenceJul-2-2020

Generalization remains a challenging problem for deep reinforcement learning algorithms, which are often trained and tested on the same set of deterministic game environments. When test environments are unseen and perturbed but the nature of the task remains the same, generalization gaps can arise. In this work, we propose and evaluate a surprise minimizing agent on a generalization benchmark to show an additional reward learned from a simple density model can show robustness in procedurally generated game environments that provide constant source of entropy and stochasticity.

artificial intelligence, machine learning, reinforcement learning generalization, (14 more...)

arXiv.org Artificial Intelligence

2004.12399

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.54)

Industry: Leisure & Entertainment > Games > Computer Games (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Batch Inverse Reinforcement Learning Using Counterfactuals for Understanding Decision Making

Bica, Ioana, Jarrett, Daniel, Hüyük, Alihan, van der Schaar, Mihaela

arXiv.org Artificial IntelligenceJul-2-2020

A key challenge in modeling real-world decision-making is the fact that active experimentation is often impossible (e.g. in healthcare). The goal of batch inverse reinforcement learning is to recover and understand policies on the basis of demonstrated behaviour--i.e. trajectories of observations and actions made by an expert maximizing some unknown reward function. We propose incorporating counterfactual reasoning into modeling decision behaviours in this setting. At each decision point, counterfactuals answer the question: Given the current history of observations, what would happen if we took a particular action? First, this offers a principled approach to learning inherently interpretable reward functions, which enables understanding the cost-benefit tradeoffs associated with an expert's actions. Second, by estimating the effects of different actions, counterfactuals readily tackle the off-policy nature of policy evaluation in the batch setting. Not only does this alleviate the cold-start problem typical of conventional solutions, but also accommodates settings where the expert policies are depending on histories of observations rather than just current states. Through experiments in both real and simulated medical environments, we illustrate the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of expert behaviour.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2007.13531

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch

Viano, Luca, Huang, Yu-Ting, Kamalaruban, Parameswaran, Cevher, Volkan

arXiv.org Machine LearningJul-2-2020

We study the inverse reinforcement learning (IRL) problem under the \emph{transition dynamics mismatch} between the expert and the learner. In particular, we consider the Maximum Causal Entropy (MCE) IRL learner model and provide an upper bound on the learner's performance degradation based on the $\ell_1$-distance between the two transition dynamics of the expert and the learner. Then, by leveraging insights from the Robust RL literature, we propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch issue. Finally, we empirically demonstrate the stable performance of our algorithm compared to the standard MCE IRL algorithm under transition mismatches in finite MDP problems.

machine learning, reinforcement learning, total return total return, (12 more...)

arXiv.org Machine Learning

2007.01174

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Human-centered collaborative robots with deep reinforcement learning

Ghadirzadeh, Ali, Chen, Xi, Yin, Wenjie, Yi, Zhengrong, Björkman, Mårten, Kragic, Danica

arXiv.org Artificial IntelligenceJul-2-2020

Human-centered collaborative systems require proactive robot behavior with precise timing, which in turn mandates awareness of human actions, state of the environment and the task being executed, [1-4]. Proactive robot behavior is achieved by (1) recognizing the current state of the human collaborator and the environment based on real-time observations, (2) human action prediction given the observations and the model of the task, and (3) generating robot actions in line with the prediction. Human action recognition may however be highly uncertain if the human collaborator is not executing a strictly defined task plan. This is true regardless of whether perception is based on motion-capture devices or image based pose estimation. For a robot to act in a proactive manner, while at the same time avoiding actions when the risk of making a mistake is too high, it is essential for the action-decision system to take this uncertainty into consideration. We therefore propose to train the perception system and the robot policy in an end-to-end fashion using reinforcement learning (RL). This is different from earlier studies in which human action recognition and prediction are typically decoupled from robot action policy training [3-7]. Our main objective is to improve the fluency in coordination between the human and robot partners by allowing the policy to explicitly weigh the benefits of timely actions to the risk of making a mistake when uncertainties are too high.

machine learning, motion data, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2007.01009

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Fiber: Distributed Computing for AI Made Simple

#artificialintelligenceJul-1-2020, 16:25:44 GMT

Jeff Clune is the former Loy and Edith Harris Associate Professor in Computer Science at the University of Wyoming, a Senior Research Manager and founding member of Uber AI Labs, and currently a Research Team Leader at OpenAI. Jeff focuses on robotics and training neural networks via deep learning and deep reinforcement learning. He has also researched open questions in evolutionary biology using computational models of evolution, including studying the evolutionary origins of modularity, hierarchy, and evolvability. Prior to becoming a professor, he was a Research Scientist at Cornell University, received a PhD in computer science and an MA in philosophy from Michigan State University, and received a BA in philosophy from the University of Michigan. More about Jeff's research can be found at JeffClune.com

deep learning, reinforcement learning, university, (5 more...)

#artificialintelligence

Country:

North America > United States > Michigan (0.60)
North America > United States > Wyoming (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.32)

Add feedback

Artificial Intelligence: Reinforcement Learning in Python

#artificialintelligenceJul-1-2020, 11:15:43 GMT

Artificial Intelligence: Reinforcement Learning in Python 4.5 (7,241 ratings) Course Ratings are calculated from individual students' ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately. When people talk about artificial intelligence, they usually don't mean supervised and unsupervised machine learning. These tasks are pretty trivial compared to what we think of AIs doing – playing chess and Go, driving cars, and beating video games at a superhuman level. Reinforcement learning has recently become popular for doing all of that and more. Much like deep learning, a lot of the theory was discovered in the 70s and 80s but it hasn't been until recently that we've been able to observe first hand the amazing results that are possible.

artificial intelligence, machine learning, reinforcement learning, (2 more...)

#artificialintelligence

Industry:

Education (0.80)
Leisure & Entertainment > Games > Computer Games (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Improvement from Multiple Experts

Cheng, Ching-An, Kolobov, Andrey, Agarwal, Alekh

arXiv.org Artificial IntelligenceJul-1-2020

Despite its promise, reinforcement learning's real-world adoption has been hampered by its need for costly exploration to learn a good policy. Imitation learning (IL) mitigates this shortcoming by using an expert policy during training as a bootstrap to accelerate the learning process. However, in many practical situations, the learner has access to multiple suboptimal experts, which may provide conflicting advice in a state. The existing IL literature provides a limited treatment of such scenarios. Whereas in the single-expert case, the return of the expert's policy provides an obvious benchmark for the learner to compete against, neither such a benchmark nor principled ways of outperforming it are known for the multi-expert setting. In this paper, we propose the state-wise maximum of the expert policies' values as a natural baseline to resolve conflicting advice from multiple experts. Using a reduction of policy optimization to online learning, we introduce a novel IL algorithm MAMBA, which can provably learn a policy competitive with this benchmark. In particular, MAMBA optimizes policies by using a gradient estimator in the style of generalized advantage estimation (GAE). Our theoretical analysis shows that this design makes MAMBA robust and enables it to outperform the expert policies by a larger margin than IL state of the art, even in the single-expert case. In an evaluation against standard policy gradient with GAE and AggreVaTeD, we showcase MAMBA's ability to leverage demonstrations both from a single and from multiple weak experts, and significantly speed up policy optimization.

machine learning, mamba-0, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2007.00795

Country:

North America > United States > Washington > King County > Redmond (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback