AITopics | Sanner, Scott

Risk-Aware Transfer in Reinforcement Learning using Successor Features

Gimelfarb, Michael, Barreto, André, Sanner, Scott, Lee, Chi-Guhn

arXiv.org Artificial IntelligenceMay-28-2021

Sample efficiency and risk-awareness are central to the development of practical reinforcement learning (RL) for complex decision-making. The former can be addressed by transfer learning and the latter by optimizing some utility function of the return. However, the problem of transferring skills in a risk-aware manner is not well-understood. In this paper, we address the problem of risk-aware policy transfer between tasks in a common domain that differ only in their reward functions, in which risk is measured by the variance of reward streams. Our approach begins by extending the idea of generalized policy improvement to maximize entropic utilities, thus extending policy improvement via dynamic programming to sets of policies and levels of risk-aversion. Next, we extend the idea of successor features (SF), a value function representation that decouples the environment dynamics from the rewards, to capture the variance of returns. Our resulting risk-aware successor features (RaSF) integrate seamlessly within the RL framework, inherit the superior task generalization ability of SFs, and incorporate risk-awareness into the decision-making. Experiments on a discrete navigation domain and control of a simulated robotic arm demonstrate the ability of RaSFs to outperform alternative methods including SFs, when taking the risk of the learned policies into account.

artificial intelligence, neural network, successor feature, (16 more...)

arXiv.org Artificial Intelligence

2105.14127

Country: North America > Canada > Ontario > Toronto (0.28)

Genre: Research Report (0.63)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Adversarial Shapley Value Experience Replay for Task-Free Continual Learning

Mai, Zheda, Shim, Dongsub, Jeong, Jihwan, Sanner, Scott, Kim, Hyunwoo, Jang, Jongseong

arXiv.org Machine LearningAug-31-2020

Continual learning is a branch of deep learning that seeks to strike a balance between learning stability and plasticity. In this paper, we specifically focus on the task-free setting where data are streamed online without task metadata and clear task boundaries. A simple and highly effective algorithm class for this setting is known as Experience Replay (ER) that selectively stores data samples from previous experience and leverages them to interleave memory-based and online batch learning updates. Recent advances in ER have proposed novel methods for scoring which samples to store in memory and which memory samples to interleave with online data during learning updates. In this paper, we contribute a novel Adversarial Shapley value ER (ASER) method that scores memory data samples according to their ability to preserve latent decision boundaries for previously observed classes (to maintain learning stability and avoid forgetting) while interfering with latent decision boundaries of current classes being learned (to encourage plasticity and optimal learning of new class boundaries). Overall, we observe that ASER provides competitive or improved performance on a variety of datasets compared to state-of-the-art ER-based continual learning methods.

dataset, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

2009.00093

Country: North America > Canada > Ontario > Toronto (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Scalable Planning with Deep Neural Network Learned Transition Models

Wu, Ga (University of Toronto) | Say, Buser | Sanner, Scott

Journal of Artificial Intelligence ResearchJul-20-2020

In many complex planning problems with factored, continuous state and action spaces such as Reservoir Control, Heating Ventilation and Air Conditioning (HVAC), and Navigation domains, it is difficult to obtain a model of the complex nonlinear dynamics that govern state evolution. However, the ubiquity of modern sensors allows us to collect large quantities of data from each of these complex systems and build accurate, nonlinear deep neural network models of their state transitions. But there remains one major problem for the task of control - how can we plan with deep network learned transition models without resorting to Monte Carlo Tree Search and other black-box transition model techniques that ignore model structure and do not easily extend to continuous domains? In this paper, we introduce two types of planning methods that can leverage deep neural network learned transition models: Hybrid Deep MILP Planner (HD-MILP-Plan) and Tensorflow Planner (TF-Plan). In HD-MILP-Plan, we make the critical observation that the Rectified Linear Unit (ReLU) transfer function for deep networks not only allows faster convergence of model learning, but also permits a direct compilation of the deep network transition model to a Mixed-Integer Linear Program (MILP) encoding. Further, we identify deep network specific optimizations for HD-MILP-Plan that improve performance over a base encoding and show that we can plan optimally with respect to the learned deep networks. In TF-Plan, we take advantage of the efficiency of auto-differentiation tools and GPU-based computation where we encode a subclass of purely continuous planning problems as Recurrent Neural Networks and directly optimize the actions through backpropagation. We compare both planners and show that TF-Plan is able to approximate the optimal plans found by HD-MILP-Plan in less computation time. Hence this article offers two novel planners for continuous state and action domains with learned deep neural net transition models: one optimal method (HD-MILP-Plan) and a scalable alternative for large-scale problems (TF-Plan).

artificial intelligence, hvac, machine learning, (20 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11829

AI Access Foundation

11829

Journal of Artificial Intelligence Research

Country: North America > Canada > Ontario (0.28)

Genre: Research Report (0.67)

Industry:

Information Technology (0.67)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Batch-level Experience Replay with Review for Continual Learning

Mai, Zheda, Kim, Hyunwoo, Jeong, Jihwan, Sanner, Scott

arXiv.org Artificial IntelligenceJul-11-2020

Current CL methods can be taxonomized into three major categories: regularization-based, parameter isolation, Continual learning is a branch of deep learning that and memory-based methods [14]. Some regularizationbased seeks to strike a balance between learning stability and methods encode the knowledge from past tasks into plasticity. The CVPR 2020 CLVision Continual Learning a prior and utilize the prior to either regularize the update for Computer Vision challenge is dedicated to evaluating of parameters that were important to past tasks [8, 18, 13] and advancing the current state-of-the-art continual while others leverage knowledge distillation from the model learning methods using the CORe50 dataset with trained on previous tasks to the model being trained on three different continual learning scenarios.

batch, deep learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2007.05683

Country: North America (0.47)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

{\epsilon}-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

Gimelfarb, Michael, Sanner, Scott, Lee, Chi-Guhn

arXiv.org Machine LearningJul-2-2020

Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which despite its simplicity, remains one of the most frequently used forms of exploration. However, a key limitation of this policy is the specification of $\varepsilon$. In this paper, we provide a novel Bayesian perspective of $\varepsilon$ as a measure of the uniformity of the Q-value function. We introduce a closed-form Bayesian model update based on Bayesian model combination (BMC), based on this new perspective, which allows us to adapt $\varepsilon$ using experiences from the environment in constant time with monotone convergence guarantees. We demonstrate that our proposed algorithm, $\varepsilon$-\texttt{BMC}, efficiently balances exploration and exploitation on different problems, performing comparably or outperforming the best tuned fixed annealing schedules and an alternative data-dependent $\varepsilon$ adaptation scheme proposed in the literature.

bayesian inference, upstream oil & gas, vdbe, (19 more...)

arXiv.org Machine Learning

2007.00869

Country: North America > Canada (0.28)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Bayesian Experience Reuse for Learning from Multiple Demonstrators

Gimelfarb, Michael, Sanner, Scott, Lee, Chi-Guhn

arXiv.org Machine LearningJun-10-2020

Learning from demonstrations (LfD) improves the exploration efficiency of a learning agent by incorporating demonstrations from experts. However, demonstration data can often come from multiple experts with conflicting goals, making it difficult to incorporate safely and effectively in online settings. We address this problem in the static and dynamic optimization settings by modelling the uncertainty in source and target task functions using normal-inverse-gamma priors, whose corresponding posteriors are, respectively, learned from demonstrations and target data using Bayesian neural networks with shared features. We use this learned belief to derive a quadratic programming problem whose solution yields a probability distribution over the expert models. Finally, we propose Bayesian Experience Reuse (BERS) to sample demonstrations in accordance with this distribution and reuse them directly in new tasks. We demonstrate the effectiveness of this approach for static optimization of smooth functions, and transfer learning in a high-dimensional supply chain problem with cost uncertainty.

demonstration, neural network, optimization problem, (17 more...)

arXiv.org Machine Learning

2006.05725

Country: North America > Canada > Ontario > Toronto (0.28)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
(2 more...)

Add feedback

Reward Potentials for Planning with Learned Neural Network Transition Models

Say, Buser, Sanner, Scott, Thiébaux, Sylvie

arXiv.org Artificial IntelligenceApr-19-2019

Optimal planning with respect to learned neural network (NN) models in continuous action and state spaces using mixed-integer linear programming (MILP) is a challenging task for branch-and-bound solvers due to the poor linear relaxation of the underlying MILP model. For a given set of features, potential heuristics provide an efficient framework for computing bounds on cost (reward) functions. In this paper, we introduce a finite-time algorithm for computing an optimal potential heuristic for learned NN models. We then strengthen the linear relaxation of the underlying MILP model by introducing constraints to bound the reward function based on the precomputed reward potentials. Experimentally, we show that our algorithm efficiently computes reward potentials for learned NN models, and the overhead of computing reward potentials is justified by the overall strengthening of the underlying MILP model for the task of planning over long-term horizons.

artificial intelligence, neural network, reward potential, (15 more...)

arXiv.org Artificial Intelligence

1904.09366

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Scalable Nonlinear Planning with Deep Neural Network Learned Transition Models

Wu, Ga, Say, Buser, Sanner, Scott

arXiv.org Artificial IntelligenceApr-5-2019

In many real-world planning problems with factored, mixed discrete and continuous state and action spaces such as Reservoir Control, Heating Ventilation and Air Conditioning (HVAC), and Navigation domains, it is difficult to obtain a model of the complex nonlinear dynamics that govern state evolution. However, the ubiquity of modern sensors allows us to collect large quantities of data from each of these complex systems and build accurate, nonlinear deep neural network models of their state transitions. But there remains one major problem for the task of control - how can we plan with deep network learned transition models without resorting to Monte Carlo Tree Search and other black-box transition model techniques that ignore model structure and do not easily extend to mixed discrete and continuous domains? In this paper, we introduce two types of nonlinear planning methods that can leverage deep neural network learned transition models: Hybrid Deep MILP Planner (HD-MILP-Plan) and Tensorflow Planner (TF-Plan). In HD-MILP-Plan, we make the critical observation that the Rectified Linear Unit (ReLU) transfer function for deep networks not only allows faster convergence of model learning, but also permits a direct compilation of the deep network transition model to a Mixed-Integer Linear Program (MILP) encoding. Further, we identify deep network specific optimizations for HD-MILP-Plan that improve performance over a base encoding and show that we can plan optimally with respect to the learned deep networks. In TF-Plan, we take advantage of the efficiency of auto-differentiation tools and GPU-based computation where we encode a subclass of purely continuous planning problems as Recurrent Neural Networks and directly optimize the actions through backpropagation. We compare both planners and show that TF-Plan is able to approximate the optimal plans found by HD-MILP-Plan in less computation time. Hence this article offers two novel planners for learned deep neural net transition models: one optimal method for mixed discrete and continuous state and actions (HD-MILP-Plan) and a scalable alternative for large-scale purely continuous state and action problems (TF-Plan).

constraint, deep learning, upstream oil & gas, (18 more...)

arXiv.org Artificial Intelligence

1904.02873

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.81)

Industry:

Information Technology (0.67)
Energy > Oil & Gas > Upstream (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach

Gimelfarb, Michael, Sanner, Scott, Lee, Chi-Guhn

Neural Information Processing SystemsDec-31-2018

Potential based reward shaping is a powerful technique for accelerating convergence of reinforcement learning algorithms. Typically, such information includes an estimate of the optimal value function and is often provided by a human expert or other sources of domain knowledge. However, this information is often biased or inaccurate and can mislead many reinforcement learning algorithms. In this paper, we apply Bayesian Model Combination with multiple experts in a way that learns to trust a good combination of experts as training progresses. This approach is both computationally efficient and general, and is shown numerically to improve convergence across discrete and continuous domains and different reinforcement learning algorithms.

Add feedback

Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach

Gimelfarb, Michael, Sanner, Scott, Lee, Chi-Guhn

Neural Information Processing SystemsDec-31-2018

Potential based reward shaping is a powerful technique for accelerating convergence of reinforcement learning algorithms. Typically, such information includes an estimate of the optimal value function and is often provided by a human expert or other sources of domain knowledge. However, this information is often biased or inaccurate and can mislead many reinforcement learning algorithms. In this paper, we apply Bayesian Model Combination with multiple experts in a way that learns to trust a good combination of experts as training progresses. This approach is both computationally efficient and general, and is shown numerically to improve convergence across discrete and continuous domains and different reinforcement learning algorithms.

Add feedback

Filters

Collaborating Authors

Sanner, Scott

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Risk-Aware Transfer in Reinforcement Learning using Successor Features

Adversarial Shapley Value Experience Replay for Task-Free Continual Learning

Scalable Planning with Deep Neural Network Learned Transition Models

Batch-level Experience Replay with Review for Continual Learning

{\epsilon}-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

Bayesian Experience Reuse for Learning from Multiple Demonstrators

Reward Potentials for Planning with Learned Neural Network Transition Models

Scalable Nonlinear Planning with Deep Neural Network Learned Transition Models

Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach

Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach