Goto

Collaborating Authors

 Reinforcement Learning


Transforming from Autonomous to Smart: Reinforcement Learning Basics

@machinelearnbot

In the blog "From Autonomous to Smart: Importance of Artificial Intelligence," we laid out the artificial intelligence (AI) challenges in creating "smart" edge devices: We also talked about how Moore's Law isn't going to bail us out of these challenges; that the growth of Internet of Things (IOT) data and the complexity of the problems that we are trying to address at the edge (think "smart" cars) is growing much faster than Moore's Law can accommodate. So we are going to use this blog to deep dive into the category of artificial intelligence called reinforcement learning. We are going to see how reinforcement learning might help us to address these challenges; to work smarter at the edge when brute force technology advances will not suffice. With the rapid increases in computing power, it's easy to get seduced into thinking that raw computing power can solve problems like smart edge devices (e.g., cars, trains, airplanes, wind turbines, jet engines, medical devices). Look at the dramatic increase in the number of possible moves between checkers and chess even though the board layout is exactly the same. The only difference between checkers and chess is the types of moves that pieces can make.


A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloud computing system. However, a complete cloud resource allocation framework exhibits high dimensions in state and action spaces, which prohibit the usefulness of traditional RL techniques. In addition, high power consumption has become one of the critical concerns in design and control of cloud computing systems, which degrades system reliability and increases cooling cost. An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradation within an acceptable level. Thus, a joint virtual machine (VM) resource allocation and power management framework is critical to the overall cloud computing system. Moreover, novel solution framework is necessary to address the even higher dimensions in state and action spaces. In this paper, we propose a novel hierarchical framework for solving the overall resource allocation and power management problem in cloud computing systems. The proposed hierarchical framework comprises a global tier for VM resource allocation to the servers and a local tier for distributed power management of local servers. The emerging deep reinforcement learning (DRL) technique, which can deal with complicated control problems with large state space, is adopted to solve the global tier problem. Furthermore, an autoencoder and a novel weight sharing structure are adopted to handle the high-dimensional state space and accelerate the convergence speed. On the other hand, the local tier of distributed server power managements comprises an LSTM based workload predictor and a model-free RL based power manager, operating in a distributed manner.


DeepMind AI Learns Imagination-Based Planning – Frank's World of Data Science

#artificialintelligence

Two Minute Papers reviews how DeepMind learned how to play the Atari classic "Break Out" simply by observing the game being played from a video feed. The original paper "Imagination-Augmented Agents for Deep Reinforcement Learning" is online at https://arxiv.org/abs/1707.06203


[N] DeepMind and Blizzard open StarCraft II as an AI research environment • r/MachineLearning

@machinelearnbot

Novice here: I really want to try this Starcraft API but I don't know how to start. I believe this uses more reinforcement learning and agent-based models (which honestly I am not familiar with yet) What are good papers to get started on this?


What is reinforcement learning? A short intro in 8 slides.

#artificialintelligence

In an upcoming screencast I'm doing with O'Reilly I'll be discussing what reinforcement learning is and how it applies. I figured I'd give you all a little behind the scenes look.


Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs

arXiv.org Artificial Intelligence

We approach structured output prediction by optimizing a deep value network (DVN) to precisely estimate the task loss on different output configurations for a given input. Once the model is trained, we perform inference by gradient descent on the continuous relaxations of the output variables to find outputs with promising scores from the value network. When applied to image segmentation, the value network takes an image and a segmentation mask as inputs and predicts a scalar estimating the intersection over union between the input and ground truth masks. For multi-label classification, the DVN's objective is to correctly predict the F1 score for any potential label configuration. The DVN framework achieves the state-of-the-art results on multi-label prediction and image segmentation benchmarks.


Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning

arXiv.org Machine Learning

In this paper we study how to learn stochastic, multimodal transition dynamics in reinforcement learning (RL) tasks. We focus on evaluating transition function estimation, while we defer planning over this model to future work. Stochasticity is a fundamental property of many task environments. However, discriminative function approximators have difficulty estimating multimodal stochasticity. In contrast, deep generative models do capture complex high-dimensional outcome distributions. First we discuss why, amongst such models, conditional variational inference (VI) is theoretically most appealing for model-based RL. Subsequently, we compare different VI models on their ability to learn complex stochasticity on simulated functions, as well as on a typical RL gridworld with multimodal dynamics. Results show VI successfully predicts multimodal outcomes, but also robustly ignores these for deterministic parts of the transition dynamics. In summary, we show a robust method to learn multimodal transitions using function approximation, which is a key preliminary for model-based RL in stochastic domains.


Watching artificial intelligence teach itself how to walk is weirdly captivating

#artificialintelligence

Well, computer scientists from the University of British Columbia and National University of Singapore just did that with a bipedal computer model (read: essentially a pair of animated legs) -- only instead of a cute cartoon rabbit, the teacher is a deep reinforcement learning artificial intelligence algorithm. Google's DeepMind, for example, has used reinforcement learning to teach an AI to play classic video games by working out how to achieve high scores. It's like watching your kid grow up -- except that, you know, in this case, your kid is a pair of disembodied AI legs powered by Skynet! A paper describing the work, titled "DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning" was published in the journal Transactions on Graphics.


?utm_content=buffercf7c6&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

#artificialintelligence

Well, computer scientists from the University of British Columbia and National University of Singapore just did that with a bipedal computer model (read: essentially a pair of animated legs) -- only instead of a cute cartoon rabbit, the teacher is a deep reinforcement learning artificial intelligence algorithm. Google's DeepMind, for example, has used reinforcement learning to teach an AI to play classic video games by working out how to achieve high scores. It's like watching your kid grow up -- except that, you know, in this case, your kid is a pair of disembodied AI legs powered by Skynet! A paper describing the work, titled "DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning" was published in the journal Transactions on Graphics.


[R] RL-Teacher - Open Source Deep RL from Human Preferences • r/MachineLearning

@machinelearnbot

A bunch of people have been asking for an implementation of Deep Reinforcement Learning from Human Preferences [Christiano et al., 2017] that came out last month. This contains a simplified system designed to be easy to read and understand, plus the webapp that we used for collecting feedback from humans. Happy to answer any questions that you have here!