Goto

Collaborating Authors

 Reinforcement Learning


Class Teaching for Inverse Reinforcement Learners

arXiv.org Artificial Intelligence

In this paper we propose the first machine teaching algorithm for multiple inverse reinforcement learners. Specifically, our contributions are: (i) we formally introduce the problem of teaching a sequential task to a heterogeneous group of learners; (ii) we identify conditions under which it is possible to conduct such teaching using the same demonstration for all learners; and (iii) we propose and evaluate a simple algorithm that computes a demonstration(s) ensuring that all agents in a heterogeneous class learn a task description that is compatible with the target task. Our analysis shows that, contrary to other teaching problems, teaching a heterogeneous class with a single demonstration may not be possible as the differences between agents increase. We also showcase the advantages of our proposed machine teaching approach against several possible alternatives.


Deep Reinforcement Learning and Its Applications - Inteliment Technologies

#artificialintelligence

The term Deep Reinforcement Learning is a new cool phrase in the world of Artificial Intelligence and Machine Learning. So, what does this phrase mean, and what is its impact? Deep Reinforcement Learning uses the combined principles of deep learning and reinforcement learning. Deep Learning, as we know, Deep learning is a part of machine learning methods and is based on artificial neural networks. Reinforcement Learning, on the other hand, is an area of machine learning which tells how software agents should take actions to maximize the probability of choosing the best possible path or behavior for a particular situation.



Hierarchical model-based policy optimization: from actions to action sequences and back

arXiv.org Artificial Intelligence

We develop a normative framework for hierarchical model-based policy optimization based on applying second-order methods in the space of all possible state-action paths. The resulting natural path gradient performs policy updates in a manner which is sensitive to the long-range correlational structure of the induced stationary state-action densities. We demonstrate that the natural path gradient can be computed exactly given an environment dynamics model and depends on expressions akin to higher-order successor representations. In simulation, we show that the priorization of local policy updates in the resulting policy flow indeed reflects the intuitive state-space hierarchy in several toy problems.


Multi-Agent Deep Reinforcement Learning with Adaptive Policies

arXiv.org Artificial Intelligence

We propose a novel approach to address one aspect of the non-stationarity problem in multi-agent reinforcement learning (RL), where the other agents may alter their policies due to environment changes during execution. This violates the Markov assumption that governs most single-agent RL methods and is one of the key challenges in multi-agent RL. To tackle this, we propose to train multiple policies for each agent and postpone the selection of the best policy at execution time. Specifically, we model the environment non-stationarity with a finite set of scenarios and train policies fitting each scenario. In addition to multiple policies, each agent also learns a policy predictor to determine which policy is the best with its local information. By doing so, each agent is able to adapt its policy when the environment changes and consequentially the other agents alter their policies during execution. We empirically evaluated our method on a variety of common benchmark problems proposed for multi-agent deep RL in the literature. Our experimental results show that the agents trained by our algorithm have better adaptiveness in changing environments and outperform the state-of-the-art methods in all the tested environments.


Playing Games in the Dark: An approach for cross-modality transfer in reinforcement learning

arXiv.org Artificial Intelligence

In this work we explore the use of latent representations obtained from multiple input sensory modalities (such as images or sounds) in allowing an agent to learn and exploit policies over different subsets of input modalities. We propose a three-stage architecture that allows a reinforcement learning agent trained over a given sensory modality, to execute its task on a different sensory modality --for example, learning a visual policy over image inputs, and the n execute such policy when only sound inputs are available. We show that the generalized policies achieve better out-of-the-b ox performance when compared to different baselines. Moreover, we sho w this holds in different OpenAI gym and video game environments, even when using different multimodal generative models and reinforcement learning algorithms.


Augmented Random Search for Quadcopter Control: An alternative to Reinforcement Learning

arXiv.org Artificial Intelligence

Model-based reinforcement learning strategies are believed to exhibit more significant sample complexity than model-free strategies to control dynamical systems,such as quadcopters.This belief that Model-based strategies that involve the use of well-trained neural networks for making such high-level decisions always give better performance can be dispelled by making use of Model-free policy search methods.This paper proposes the use of a model-free random searching strategy,called Augmented Random Search(ARS),which is a better and faster approach of linear policy training for continuous control tasks like controlling a Quadcopters flight.The method achieves state-of-the-art accuracy by eliminating the use of too much data for the training of neural networks that are present in the previous approaches to the task of Quadcopter control.The paper also highlights the performance results of the searching strategy used for this task in a strategically designed task environment with the help of simulations.Reward collection performance over 1000 episodes and agents behavior in flight for augmented random search is compared with that of the behavior for reinforcement learning state-of-the-art algorithm,called Deep Deterministic policy gradient(DDPG).Our simulations and results manifest that a high variability in performance is observed in commonly used strategies for sample efficiency of such tasks but the built policy network of ARS-Quad can react relatively accurately to step response providing a better performing alternative to reinforcement learning strategies.


Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

arXiv.org Machine Learning

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)---a novel model-based approach---that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.


Top 10 Machine Learning Algorithms for Beginners Machine Learning Tutorial [Data Science]

#artificialintelligence

This Machine Learning Algorithms Tutorial video by Learnaholic India will help you learn Machine Learning Tutorial, what is Machine Learning, [Data Science] various Machine Learning problems and the algorithms, key Machine Learning algorithms with simple examples. The key Machine Learning algorithms discussed in detail are Linear Regression, Logistic Regression, Decision Tree, Random Forest and KNN algorithm. Machine Learning Tutorial [Data Science] Top 10 Machine Learning Algorithms for Beginners In this Machine Learning Algorithms Tutorial video you will understand: 1) Types of Machine Learning Algorithms (00:25) 2) Supervised Learning Algorithms (00:30) 3) Unsupervised Learning Algorithms (1:59) 4) Reinforcement Learning Algorithms (3:38) 5) Top 10 Machine Learning Algorithms for Beginners (4:33) This Machine Learning Algorithms Tutorial shall teach you what machine learning is, and the various ways in which you can use machine learning to solve a problem! Towards the end, you will learn how to prepare a data-set for model creation and validation and how you can create a model using any machine learning algorithm! Hit the subscribe button above.


Control-Tutored Reinforcement Learning: an application to the Herding Problem

arXiv.org Artificial Intelligence

EXTENDED ABSTRACT Model-free reinforcement learning (or simply reinforcement learning, RL, in what follows) is increasingly used in applications to solve a wide variety of control problems (Kober et al., 2013; Garcıa and Fern andez, 2015; Cheng et al., 2019). The lack of requiring a formal model of the plant renders it appealing for a heuristic, low-cost control design approach that can be easily implemented and adapted to different situations. As a tradeoff, learning processes often require a long training phase where the controller agent learns by trial-and-error how the plant responds to different control actions, and what actions to take to steer its behavior in a desired manner. This problem is particularly relevant when using tabular methods, such as Q-learning, in those situations where reinforcement learning is applied to control dynamical systems defined in continuous spaces (Lillicrap et al., 2019). It is therefore desirable to enhance the learning process by encoding some qualitative knowledge of the system dynamics via appropriate models.