Reinforcement Learning
Transforming from Autonomous to Smart: Reinforcement Learning Basics
In the blog "From Autonomous to Smart: Importance of Artificial Intelligence," we laid out the artificial intelligence (AI) challenges in creating "smart" edge devices: We also talked about how Moore's Law isn't going to bail us out of these challenges; that the growth of Internet of Things (IOT) data and the complexity of the problems that we are trying to address at the edge (think "smart" cars) is growing much faster than Moore's Law can accommodate. So we are going to use this blog to deep dive into the category of artificial intelligence called reinforcement learning. We are going to see how reinforcement learning might help us to address these challenges; to work smarter at the edge when brute force technology advances will not suffice. With the rapid increases in computing power, it's easy to get seduced into thinking that raw computing power can solve problems like smart edge devices (e.g., cars, trains, airplanes, wind turbines, jet engines, medical devices). Look at the dramatic increase in the number of possible moves between checkers and chess even though the board layout is exactly the same. The only difference between checkers and chess is the types of moves that pieces can make.
A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning
Liu, Ning, Li, Zhe, Xu, Zhiyuan, Xu, Jielong, Lin, Sheng, Qiu, Qinru, Tang, Jian, Wang, Yanzhi
Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloud computing system. However, a complete cloud resource allocation framework exhibits high dimensions in state and action spaces, which prohibit the usefulness of traditional RL techniques. In addition, high power consumption has become one of the critical concerns in design and control of cloud computing systems, which degrades system reliability and increases cooling cost. An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradation within an acceptable level. Thus, a joint virtual machine (VM) resource allocation and power management framework is critical to the overall cloud computing system. Moreover, novel solution framework is necessary to address the even higher dimensions in state and action spaces. In this paper, we propose a novel hierarchical framework for solving the overall resource allocation and power management problem in cloud computing systems. The proposed hierarchical framework comprises a global tier for VM resource allocation to the servers and a local tier for distributed power management of local servers. The emerging deep reinforcement learning (DRL) technique, which can deal with complicated control problems with large state space, is adopted to solve the global tier problem. Furthermore, an autoencoder and a novel weight sharing structure are adopted to handle the high-dimensional state space and accelerate the convergence speed. On the other hand, the local tier of distributed server power managements comprises an LSTM based workload predictor and a model-free RL based power manager, operating in a distributed manner.
Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
Gygli, Michael, Norouzi, Mohammad, Angelova, Anelia
We approach structured output prediction by optimizing a deep value network (DVN) to precisely estimate the task loss on different output configurations for a given input. Once the model is trained, we perform inference by gradient descent on the continuous relaxations of the output variables to find outputs with promising scores from the value network. When applied to image segmentation, the value network takes an image and a segmentation mask as inputs and predicts a scalar estimating the intersection over union between the input and ground truth masks. For multi-label classification, the DVN's objective is to correctly predict the F1 score for any potential label configuration. The DVN framework achieves the state-of-the-art results on multi-label prediction and image segmentation benchmarks.
Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning
Moerland, Thomas M., Broekens, Joost, Jonker, Catholijn M.
In this paper we study how to learn stochastic, multimodal transition dynamics in reinforcement learning (RL) tasks. We focus on evaluating transition function estimation, while we defer planning over this model to future work. Stochasticity is a fundamental property of many task environments. However, discriminative function approximators have difficulty estimating multimodal stochasticity. In contrast, deep generative models do capture complex high-dimensional outcome distributions. First we discuss why, amongst such models, conditional variational inference (VI) is theoretically most appealing for model-based RL. Subsequently, we compare different VI models on their ability to learn complex stochasticity on simulated functions, as well as on a typical RL gridworld with multimodal dynamics. Results show VI successfully predicts multimodal outcomes, but also robustly ignores these for deterministic parts of the transition dynamics. In summary, we show a robust method to learn multimodal transitions using function approximation, which is a key preliminary for model-based RL in stochastic domains.
Watching artificial intelligence teach itself how to walk is weirdly captivating
Well, computer scientists from the University of British Columbia and National University of Singapore just did that with a bipedal computer model (read: essentially a pair of animated legs) -- only instead of a cute cartoon rabbit, the teacher is a deep reinforcement learning artificial intelligence algorithm. Google's DeepMind, for example, has used reinforcement learning to teach an AI to play classic video games by working out how to achieve high scores. It's like watching your kid grow up -- except that, you know, in this case, your kid is a pair of disembodied AI legs powered by Skynet! A paper describing the work, titled "DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning" was published in the journal Transactions on Graphics.
?utm_content=buffercf7c6&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Well, computer scientists from the University of British Columbia and National University of Singapore just did that with a bipedal computer model (read: essentially a pair of animated legs) -- only instead of a cute cartoon rabbit, the teacher is a deep reinforcement learning artificial intelligence algorithm. Google's DeepMind, for example, has used reinforcement learning to teach an AI to play classic video games by working out how to achieve high scores. It's like watching your kid grow up -- except that, you know, in this case, your kid is a pair of disembodied AI legs powered by Skynet! A paper describing the work, titled "DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning" was published in the journal Transactions on Graphics.
[R] RL-Teacher - Open Source Deep RL from Human Preferences • r/MachineLearning
A bunch of people have been asking for an implementation of Deep Reinforcement Learning from Human Preferences [Christiano et al., 2017] that came out last month. This contains a simplified system designed to be easy to read and understand, plus the webapp that we used for collecting feedback from humans. Happy to answer any questions that you have here!