AITopics | Meger, David

Cost Adaptation for Robust Decentralized Swarm Behaviour

Henderson, Peter, Vertescher, Matthew, Meger, David, Coates, Mark

arXiv.org Artificial IntelligenceSep-29-2018

Decentralized receding horizon control (D-RHC) provides a mechanism for coordination in multi-agent settings without a centralized command center. However, combining a set of different goals, costs, and constraints to form an efficient optimization objective for D-RHC can be difficult. To allay this problem, we use a meta-learning process -- cost adaptation -- which generates the optimization objective for D-RHC to solve based on a set of human-generated priors (cost and constraint functions) and an auxiliary heuristic. We use this adaptive D-RHC method for control of mesh-networked swarm agents. This formulation allows a wide range of tasks to be encoded and can account for network delays, heterogeneous capabilities, and increasingly large swarms through the adaptation mechanism. We leverage the Unity3D game engine to build a simulator capable of introducing artificial networking failures and delays in the swarm. Using the simulator we validate our method on an example coordinated exploration task. We demonstrate that cost adaptation allows for more efficient and safer task completion under varying environment conditions and increasingly large swarm sizes. We release our simulator and code to the community for future work.

agent, artificial intelligence, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

1709.07114

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games (0.34)
Information Technology (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement Learning

Higuera, Juan Camilo Gamboa, Meger, David, Dudek, Gregory

arXiv.org Artificial IntelligenceAug-1-2018

We present an algorithm for rapidly learning controllers for robotics systems. The algorithm follows the model-based reinforcement learning paradigm, and improves upon existing algorithms; namely Probabilistic learning in Control (PILCO) and a sample-based version of PILCO with neural network dynamics (Deep-PILCO). We propose training a neural network dynamics model using variational dropout with truncated Log-Normal noise. This allows us to obtain a dynamics model with calibrated uncertainty, which can be used to simulate controller executions via rollouts. We also describe set of techniques, inspired by viewing PILCO as a recurrent neural network model, that are crucial to improve the convergence of the method. We test our method on a variety of benchmark tasks, demonstrating data-efficiency that is competitive with PILCO, while being able to optimize complex neural network controllers. Finally, we assess the performance of the algorithm for learning motor controllers for a six legged autonomous underwater vehicle. This demonstrates the potential of the algorithm for scaling up the dimensionality and dataset sizes, in more complex control tasks.

controller, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

1803.02291

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Deep Reinforcement Learning That Matters

AAAI ConferencesFeb-8-2018

In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to sustaining this progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results tough to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. We illustrate the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in deep RL more reproducible. We aim to spur discussion about how to ensure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.

algorithm, artificial intelligence, reinforcement learning, (15 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Europe (0.28)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning

AAAI ConferencesFeb-8-2018

Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward function can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories arising from a diverse set of underlying reward functions rather than a single one. Thus, in inverse reinforcement learning, it is useful to consider such a decomposition. The options framework in reinforcement learning is specifically designed to decompose policies in a similar light. We therefore extend the options framework and propose a method to simultaneously recover reward options in addition to policy options. We leverage adversarial methods to learn joint reward-policy options using only observed expert states. We show that this approach works well in both simple and complex continuous control tasks and shows significant performance increases in one-shot transfer learning.

artificial intelligence, optiongan, reinforcement learning, (14 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > Canada > Quebec > Montreal (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Reinforcement Learning that Matters

Henderson, Peter, Islam, Riashat, Bachman, Philip, Pineau, Joelle, Precup, Doina, Meger, David

arXiv.org Machine LearningNov-24-2017

In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to sustaining this progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results tough to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. We illustrate the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in deep RL more reproducible. We aim to spur discussion about how to ensure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.

algorithm, artificial intelligence, computer game, (20 more...)

arXiv.org Machine Learning

1709.0656

Country: