AITopics | Boots, Byron

Composing Ensembles of Policies with Deep Reinforcement Learning

Qureshi, Ahmed H., Johnson, Jacob J., Qin, Yuzhe, Boots, Byron, Yip, Michael C.

arXiv.org Artificial IntelligenceMay-25-2019

Composition of elementary skills into complex behaviors to solve challenging problems is one of the key elements toward building intelligent machines. To date, there has been plenty of work on learning new policies or skills but almost no focus on composing them to perform complex decision-making. In this paper, we propose a policy ensemble composition framework that takes the robot's primitive policies and learns to compose them concurrently or sequentially through reinforcement learning. We evaluate our method in problems where traditional approaches either fail or exhibit high sample complexity to find a solution. We show that our method not only solves the problems that require both task and motion planning but also exhibits high data efficiency, which is currently one of the main limitations of reinforcement learning.

artificial intelligence, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

1905.10681

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning Quantum Graphical Models using Constrained Gradient Descent on the Stiefel Manifold

Adhikary, Sandesh, Srinivasan, Siddarth, Boots, Byron

arXiv.org Machine LearningMar-8-2019

Quantum graphical models (QGMs) extend the classical framework for reasoning about uncertainty by incorporating the quantum mechanical view of probability. Prior work on QGMs has focused on hidden quantum Markov models (HQMMs), which can be formulated using quantum analogues of the sum rule and Bayes rule used in classical graphical models. Despite the focus on developing the QGM framework, there has been little progress in learning these models from data. The existing state-of-the-art approach randomly initializes parameters and iteratively finds unitary transformations that increase the likelihood of the data. While this algorithm demonstrated theoretical strengths of HQMMs over HMMs, it is slow and can only handle a small number of hidden states. In this paper, we tackle the learning problem by solving a constrained optimization problem on the Stiefel manifold using a well-known retraction-based algorithm. We demonstrate that this approach is not only faster and yields better solutions on several datasets, but also scales to larger models that were prohibitively slow to train via the earlier method.

artificial intelligence, bayesian inference, kraus operator, (18 more...)

arXiv.org Machine Learning

1903.0373

Country: North America > United States (0.46)

Genre: Research Report (0.70)

Industry: Education (0.34)

Add feedback

Online Learning with Continuous Variations: Dynamic Regret and Reductions

Cheng, Ching-An, Lee, Jonathan, Goldberg, Ken, Boots, Byron

arXiv.org Machine LearningFeb-19-2019

We study the dynamic regret of a new class of online learning problems, in which the gradient of the loss function changes continuously across rounds with respect to the learner's decisions. This setup is motivated by the use of online learning as a tool to analyze the performance of iterative algorithms. Our goal is to identify interpretable dynamic regret rates that explicitly consider the loss variations as consequences of the learner's decisions as opposed to external constraints. We show that achieving sublinear dynamic regret in general is equivalent to solving certain variational inequalities, equilibrium problems, and fixed-point problems. Leveraging this identification, we present necessary and sufficient conditions for the existence of efficient algorithms that achieve sublinear dynamic regret. Furthermore, we show a reduction from dynamic regret to both static regret and convergence rate to equilibriums in the aforementioned problems, which allows us to analyze the dynamic regret of many existing learning algorithms in few steps.

computer based training, dynamic regret, educational technology, (21 more...)

arXiv.org Machine Learning

1902.07286

Country: North America > United States > California (0.14)

Genre: Research Report (0.63)

Industry: Education > Educational Setting > Online (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.84)

Add feedback

Differentiable MPC for End-to-end Planning and Control

Amos, Brandon, Jimenez, Ivan, Sacks, Jacob, Boots, Byron, Kolter, J. Zico

Neural Information Processing SystemsDec-31-2018

This provides one way of leveraging and combining the advantages of model-free and model-based approaches. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the controller. Using this strategy, we are able to learn the cost and dynamics of a controller via end-to-end learning. Our experiments focus on imitation learning in the pendulum and cartpole domains, where we learn the cost and dynamics terms of an MPC policy class. We show that our MPC policies are significantly more data-efficient than a generic neural network and that our method is superior to traditional system identification in a setting where the expert is unrealizable.

arxiv preprint arxiv, deep learning, downstream oil & gas, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Industry: Energy > Oil & Gas > Downstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

Learning and Inference in Hilbert Space with Quantum Graphical Models

Srinivasan, Siddarth, Downey, Carlton, Boots, Byron

Neural Information Processing SystemsDec-31-2018

Quantum Graphical Models (QGMs) generalize classical graphical models by adopting the formalism for reasoning about uncertainty from quantum mechanics. Unlike classical graphical models, QGMs represent uncertainty with density matrices in complex Hilbert spaces. Hilbert space embeddings (HSEs) also generalize Bayesian inference in Hilbert spaces. We investigate the link between QGMs and HSEs and show that the sum rule and Bayes rule for QGMs are equivalent to the kernel sum rule in HSEs and a special case of Nadaraya-Watson kernel regression, respectively. We show that these operations can be kernelized, and use these insights to propose a Hilbert Space Embedding of Hidden Quantum Markov Models (HSE-HQMM) to model dynamics. We present experimental results showing that HSE-HQMMs are competitive with state-of-the-art models like LSTMs and PSRNNs on several datasets, while also providing a nonparametric method for maintaining a probability distribution over continuous-valued features.

artificial intelligence, hilbert space, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Add feedback

Orthogonally Decoupled Variational Gaussian Processes

Salimbeni, Hugh, Cheng, Ching-An, Boots, Byron, Deisenroth, Marc

Neural Information Processing SystemsDec-31-2018

Gaussian processes (GPs) provide a powerful non-parametric framework for reasoning over functions. Despite appealing theory, its superlinear computational and memory complexities have presented a long-standing challenge. State-of-the-art sparse variational inference methods trade modeling accuracy against complexity. However, the complexities of these methods still scale superlinearly in the number of basis functions, implying that that sparse GP methods are able to learn from large datasets only when a small model is used. Recently, a decoupled approach was proposed that removes the unnecessary coupling between the complexities of modeling the mean and the covariance functions of a GP. It achieves a linear complexity in the number of mean parameters, so an expressive posterior mean function can be modeled. While promising, this approach suffers from optimization difficulties due to ill-conditioning and non-convexity. In this work, we propose an alternative decoupled parametrization. It adopts an orthogonal basis in the mean function to model the residues that cannot be learned by the standard coupled approach. Therefore, our method extends, rather than replaces, the coupled approach to achieve strictly better performance. This construction admits a straightforward natural gradient update rule, so the structure of the information manifold that is lost during decoupling can be leveraged to speed up learning. Empirically, our algorithm demonstrates significantly faster convergence in multiple experiments.

artificial intelligence, decoupled basis, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Dual Policy Iteration

Sun, Wen, Gordon, Geoffrey J., Boots, Byron, Bagnell, J.

Neural Information Processing SystemsDec-31-2018

Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [1], AlphaGo-Zero from [2]). This new family of algorithms maintains, and alternately optimizes, two policies: a fast, reactive policy (e.g., a deep neural network) deployed at test time, and a slow, non-reactive policy (e.g., Tree Search), that can plan multiple steps ahead. The reactive policy is updated under supervision from the non-reactive policy, while the non-reactive policy is improved with guidance from the reactive policy. In this work we study this Dual Policy Iteration (DPI) strategy in an alternating optimization framework and provide a convergence analysis that extends existing API theory. We also develop a special instance of this framework which reduces the update of non-reactive policies to model-based optimal control using learned local models, and provides a theoretically sound way of unifying model-free and model-based RL approaches with unknown dynamics. We demonstrate the efficacy of our approach on various continuous control Markov Decision Processes.

artificial intelligence, machine learning, reactive policy, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.69)

Industry: Leisure & Entertainment > Games > Go (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Dual Policy Iteration

Sun, Wen, Gordon, Geoffrey J., Boots, Byron, Bagnell, J.

Neural Information Processing SystemsDec-31-2018

Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [1], AlphaGo-Zero from [2]). This new family of algorithms maintains, and alternately optimizes, two policies: a fast, reactive policy (e.g., a deep neural network) deployed at test time, and a slow, non-reactive policy (e.g., Tree Search), that can plan multiple steps ahead. The reactive policy is updated under supervision from the non-reactive policy, while the non-reactive policy is improved with guidance from the reactive policy. In this work we study this Dual Policy Iteration (DPI) strategy in an alternating optimization framework and provide a convergence analysis that extends existing API theory. We also develop a special instance of this framework which reduces the update of non-reactive policies to model-based optimal control using learned local models, and provides a theoretically sound way of unifying model-free and model-based RL approaches with unknown dynamics. We demonstrate the efficacy of our approach on various continuous control Markov Decision Processes.

deep learning, neural network, policy improvement, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.69)

Industry: Leisure & Entertainment > Games > Go (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Orthogonally Decoupled Variational Gaussian Processes

Salimbeni, Hugh, Cheng, Ching-An, Boots, Byron, Deisenroth, Marc

Neural Information Processing SystemsDec-31-2018

Gaussian processes (GPs) provide a powerful non-parametric framework for reasoning over functions. Despite appealing theory, its superlinear computational and memory complexities have presented a long-standing challenge. State-of-the-art sparse variational inference methods trade modeling accuracy against complexity. However, the complexities of these methods still scale superlinearly in the number of basis functions, implying that that sparse GP methods are able to learn from large datasets only when a small model is used. Recently, a decoupled approach was proposed that removes the unnecessary coupling between the complexities of modeling the mean and the covariance functions of a GP. It achieves a linear complexity in the number of mean parameters, so an expressive posterior mean function can be modeled. While promising, this approach suffers from optimization difficulties due to ill-conditioning and non-convexity. In this work, we propose an alternative decoupled parametrization. It adopts an orthogonal basis in the mean function to model the residues that cannot be learned by the standard coupled approach. Therefore, our method extends, rather than replaces, the coupled approach to achieve strictly better performance. This construction admits a straightforward natural gradient update rule, so the structure of the information manifold that is lost during decoupling can be leveraged to speed up learning. Empirically, our algorithm demonstrates significantly faster convergence in multiple experiments.

artificial intelligence, decoupled basis, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Learning and Inference in Hilbert Space with Quantum Graphical Models

Srinivasan, Siddarth, Downey, Carlton, Boots, Byron

Neural Information Processing SystemsDec-31-2018

Quantum Graphical Models (QGMs) generalize classical graphical models by adopting the formalism for reasoning about uncertainty from quantum mechanics. Unlike classical graphical models, QGMs represent uncertainty with density matrices in complex Hilbert spaces. Hilbert space embeddings (HSEs) also generalize Bayesian inference in Hilbert spaces. We investigate the link between QGMs and HSEs and show that the sum rule and Bayes rule for QGMs are equivalent to the kernel sum rule in HSEs and a special case of Nadaraya-Watson kernel regression, respectively. We show that these operations can be kernelized, and use these insights to propose a Hilbert Space Embedding of Hidden Quantum Markov Models (HSE-HQMM) to model dynamics. We present experimental results showing that HSE-HQMMs are competitive with state-of-the-art models like LSTMs and PSRNNs on several datasets, while also providing a nonparametric method for maintaining a probability distribution over continuous-valued features.

deep learning, hilbert space, neural network, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: