Goto

Collaborating Authors

 Reinforcement Learning


tanglang96/MDENAS

#artificialintelligence

Here we propose a method to extremely accelerate NAS, without reinforcement learning or gradient, just by sampling architectures from a distribution and comparing these architectures, estimating their relative performance rather than absolute performance, iteratively updating parameters of the distribution while training. Search codes will be released by Sherwood later!


Stochastic Inverse Reinforcement Learning

arXiv.org Machine Learning

Inverse reinforcement learning (IRL) is an ill-posed inverse problem since expert demonstrations may infer many solutions of reward functions which is hard to recover by local search methods such as a gradient method. In this paper, we generalize the original IRL problem to recover a probability distribution for reward functions. We call such a generalized problem stochastic inverse reinforcement learning (SIRL) which is first formulated as an expectation optimization problem. We adopt the Monte Carlo expectation-maximization (MCEM) method, a global search method, to estimate the parameter of the probability distribution as the first solution to SIRL. With our approach, it is possible to observe the deep intrinsic property in IRL from a global viewpoint, and the technique achieves a considerable robust recovery performance on the classic learning environment, objectworld.


Hierarchical Reinforcement Learning for Quadruped Locomotion

arXiv.org Artificial Intelligence

Legged locomotion is a challenging task for learning algorithms, especially when the task requires a diverse set of primitive behaviors. To solve these problems, we introduce a hierarchical framework to automatically decompose complex locomotion tasks. A high-level policy issues commands in a latent space and also selects for how long the low-level policy will execute the latent command. Concurrently, the low-level policy uses the latent command and only the robot's on-board sensors to control the robot's actuators. Our approach allows the high-level policy to run at a lower frequency than the low-level one. We test our framework on a path-following task for a dynamic quadruped robot and we show that steering behaviors automatically emerge in the latent command space as low-level skills are needed for this task. We then show efficient adaptation of the trained policy to a different task by transfer of the trained low-level policy. Finally, we validate the policies on a real quadruped robot. To the best of our knowledge, this is the first application of end-to-end hierarchical learning to a real robotic locomotion task.


The False Promise of Off-Policy Reinforcement Learning Algorithms

#artificialintelligence

We have all witnessed the rapid development of reinforcement learning methods in the last couple of years. Most notably the biggest attention has been given to off-policy methods and the reason is quite obvious, they scale really well in comparison to other methods. Off-policy algorithms can (in principle) learn from data without interacting with the environment. This is a nice property, this means that we can collect our data by any means that we see fit and infer the optimal policy completely offline, in other words, we use a different behavioral policy that the one we are optimizing. Unfortunately, this doesn't work out of the box like most people think, as I will describe in this article.


Issues concerning realizability of Blackwell optimal policies in reinforcement learning

arXiv.org Machine Learning

N-discount optimality was introduced as a hierarchical form of policy- and value-function optimality, with Blackwell optimality lying at the top level of the hierarchy Veinott (1969); Blackwell (1962). We formalize notions of myopic discount factors, value functions and policies in terms of Blackwell optimality in MDPs, and we provide a novel concept of regret, called Blackwell regret, which measures the regret compared to a Blackwell optimal policy. Our main analysis focuses on long horizon MDPs with sparse rewards. We show that selecting the discount factor under which zero Blackwell regret can be achieved becomes arbitrarily hard. Moreover, even with oracle knowledge of such a discount factor that can realize a Blackwell regret-free value function, an $\epsilon$-Blackwell optimal value function may not even be gain optimal. Difficulties associated with this class of problems is discussed, and the notion of a policy gap is defined as the difference in expected return between a given policy and any other policy that differs at that state; we prove certain properties related to this gap. Finally, we provide experimental results that further support our theoretical results.


Stochastic Variance Reduction for Deep Q-learning

arXiv.org Machine Learning

Recent advances in deep reinforcement learning have achieved human-level performance on a variety of real-world applications. However, the current algorithms still suffer from poor gradient estimation with excessive variance, resulting in unstable training and poor sample efficiency. In our paper, we proposed an innovative optimization strategy by utilizing stochastic variance reduced gradient (SVRG) techniques. With extensive experiments on Atari domain, our method outperforms the deep q-learning baselines on 18 out of 20 games.


Top 5 Books on AI and ML to Grab Today

#artificialintelligence

It has been popularly noted that artificial intelligence would be like the ultimate version of Google. With recent advancements in research and technology, Artificial Intelligence (AI) and Machine Learning (ML) are slowly becoming a part of our routine. The pace at which technology is growing is unfathomable. As these smart technologies engulf our life, staying updated with them is the need of the day. So, here's Packt's selection of finest books in artificial intelligence and machine learning that will help you have an edge in these fields: Reinforcement Learning is the trending and one of the most promising branches of artificial intelligence.


Reinforcement Learning for Learning of Dynamical Systems in Uncertain Environment: a Tutorial

arXiv.org Artificial Intelligence

In this paper, a review of model-free reinforcement learning for learning of dynamical systems in uncertain environments has discussed. For this purpose, the Markov Decision Process (MDP) will be reviewed. Furthermore, some learning algorithms such as Temporal Difference (TD) learning, Q-Learning, and Approximate Q-learning as model-free algorithms which constitute the main part of this article have been investigated, and benefits and drawbacks of each algorithm will be discussed. The discussed concepts in each section are explaining with details and examples.


Perceptual Values from Observation

arXiv.org Machine Learning

Imitation by observation is an approach for learning from expert demonstrations that lack action information, such as videos. Recent approaches to this problem can be placed into two broad categories: training dynamics models that aim to predict the actions taken between states, and learning rewards or features for computing them for Reinforcement Learning (RL). In this paper, we introduce a novel approach that learns values, rather than rewards, directly from observations. We show that by using values, we can significantly speed up RL by removing the need to bootstrap action-values, as compared to sparse-reward specifications.


Spatio-Temporal Adversarial Learning for Detecting Unseen Falls

arXiv.org Machine Learning

Fall detection is an important problem from both the health and machine learning perspective. A fall can lead to severe injuries, long term impairments or even death in some cases. In terms of machine learning, it presents a severely class imbalance problem with very few or no training data for falls owing to the fact that falls occur rarely. In this paper, we take an alternate philosophy to detect falls in the absence of their training data, by training the classifier on only the normal activities (that are available in abundance) and identifying a fall as an anomaly. To realize such a classifier, we use an adversarial learning framework, which comprises of a spatio-temporal autoencoder for reconstructing input video frames and a spatio-temporal convolution network to discriminate them against original video frames. 3D convolutions are used to learn spatial and temporal features from the input video frames. The adversarial learning of the spatio-temporal autoencoder will enable reconstructing the normal activities of daily living efficiently; thus, rendering detecting unseen falls plausible within this framework. We tested the performance of the proposed framework on camera sensing modalities that may preserve an individual's privacy (fully or partially), such as thermal and depth camera. Our results on three publicly available datasets show that the proposed spatio-temporal adversarial framework performed better than other frame based (or spatial) adversarial learning methods.