Goto

Collaborating Authors

 Reinforcement Learning


Instilling AI safety into robotics through reinforcement learning - Enterprise IT Watch Blog

#artificialintelligence

Artificial intelligence (AI) is the perfect laughingstock. Any phenomenon that takes itself as seriously as AI is just asking to be ridiculed. What's even funnier is when AI comes in humanoid form, as is the case with the smart robotics that are penetrating every aspect of our lives. As Bill Vorhies discussed in his recent column, robot fails can be comedic gold. As the brains behind autonomous devices, AI can dampen the laughter only by helping devices master their assigned tasks so well and performing them so inconspicuously that we never give them a second thought.


Machine Learning for Healthcare at NIPS – Towards Data Science

#artificialintelligence

Additionally, for a significant part of Friday I was running around trying to get my own poster printed. Therefore, I may be missing some important parts. So if one of the presenters or other attendees want to add something, feel free to leave a comment and I'll make sure to include it]. Machine learning has the potential to improve hospital operations and care. Several of the invited speakers and spotlight presenters discussed this issue.


Coordinated Exploration in Concurrent Reinforcement Learning

arXiv.org Artificial Intelligence

We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment. We identify three properties - adaptivity, commitment, and diversity - which are necessary for efficient coordinated exploration and demonstrate that straightforward extensions to single-agent optimistic and posterior sampling approaches fail to satisfy them. As an alternative, we propose seed sampling, which extends posterior sampling in a manner that meets these requirements. Simulation results investigate how per-agent regret decreases as the number of agents grows, establishing substantial advantages of seed sampling over alternative exploration schemes.



Making Sense of the Bias / Variance Trade-off in (Deep) Reinforcement Learning

#artificialintelligence

Since the launch of the ML-Agents platform a few months ago, I have been surprised and delighted to find that thanks to it and other tools like OpenAI Gym, a new, wider audience of individuals are building Reinforcement Learning (RL) environments, and using them to train state-of-the-art models. The ability to work with these algorithms, previously something reserved for ML PhDs, is opening up to a wider world. As a result, I have had the unique opportunity to not just write about applying RL to existing problems, but also to help developers and researchers debug their models in a more active way. In doing so, I often get questions which come down to a matter of understanding the unique hyperparameters and learning process around the RL paradigm. In this article, I want to attempt to highlight one of these conceptual pieces: bias and variance in RL, and attempt to demystify it to some extent.


Ray's New Library Targets High Speed Reinforcement Learning

#artificialintelligence

Data scientists looking to push the ball forward in the field of reinforcement learning may want to check out RLlib, a new library released as open source last month by researchers affiliated with RISELab. According to researchers, the goal of RLlib is to enable users to break down the various components that go into a reinforcement learning, thereby making them more scalable, easier to integrate, and easier to resuse. Reinforcement learning is a type of supervised learning that's gaining popularity as a way to quickly train programs to perform tasks optimally in a world awash in less-than-optimal training data. Instead of training a model with pristine data, which is ideal in supervised learning, the reinforcement learning model learns from the data environment as it naturally exists, and uses a simple feedback mechanism (the reinforcement signal) to nudge the model towards the ideal solution. The practical advantage of the reinforcement approach is that it seeks to achieve a balance between being able to interpret uncharted data (which is where unsupervised learning algorithms flourish) and exploiting existing knowledge (where supervised learning typically excels).


Make It Happen

@machinelearnbot

This is the first part of "An Outsider's Tour of Reinforcement Learning." If you read hacker news, you'd think that deep reinforcement learning can be used to solve any problem. Deep RL has claimed to achieve superhuman performance on Go, beat atari games, control complex robotic systems, automatically tune deep learning systems, manage queueing in network stacks, and improve energy efficiency in data centers. I personally get suspicious when audacious claims like this are thrown about in press releases, and I get even more suspicious when other researchers call into question their reproducibility. I want to take a few posts to unpack what is legitimately interesting and promising in RL and what is probably just hype.


Multi-task Learning for Continuous Control

arXiv.org Machine Learning

Reliable and effective multi-task learning is a prerequisite for the development of robotic agents that can quickly learn to accomplish related, everyday tasks. However, in the reinforcement learning domain, multi-task learning has not exhibited the same level of success as in other domains, such as computer vision. In addition, most reinforcement learning research on multi-task learning has been focused on discrete action spaces, which are not used for robotic control in the real-world. In this work, we apply multi-task learning methods to continuous action spaces and benchmark their performance on a series of simulated continuous control tasks. Most notably, we show that multi-task learning outperforms our baselines and alternative knowledge sharing methods.


A Beginner's Guide to Deep Reinforcement Learning (for Java and Scala) - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM

@machinelearnbot

While neural networks are responsible for recent breakthroughs in problems like computer vision, machine translation and time series prediction – they can also combine with reinforcement learning algorithms to create something astounding like AlphaGo. Reinforcement learning refers to goal-oriented algorithms, which learn how to attain a complex objective (goal) or maximize along a particular dimension over many steps; for example, maximize the points won in a game over many moves. They can start from a blank slate, and under the right conditions they achieve superhuman performance. Like a child incentivized by spankings and candy, these algorithms are penalized when they make the wrong decisions and rewarded when they make the right ones – this is reinforcement. Reinforcement algorithms that incorporate deep learning can beat world champions at the game of Go as well as human experts playing numerous Atari video games.


Reliable Decision Support using Counterfactual Models

arXiv.org Artificial Intelligence

Decision-makers are faced with the challenge of estimating what is likely to happen when they take an action. For instance, if I choose not to treat this patient, are they likely to die? Practitioners commonly use supervised learning algorithms to fit predictive models that help decision-makers reason about likely future outcomes, but we show that this approach is unreliable, and sometimes even dangerous. The key issue is that supervised learning algorithms are highly sensitive to the policy used to choose actions in the training data, which causes the model to capture relationships that do not generalize. We propose using a different learning objective that predicts counterfactuals instead of predicting outcomes under an existing action policy as in supervised learning. To support decision-making in temporal settings, we introduce the Counterfactual Gaussian Process (CGP) to predict the counterfactual future progression of continuous-time trajectories under sequences of future actions. We demonstrate the benefits of the CGP on two important decision-support tasks: risk prediction and "what if?" reasoning for individualized treatment planning.