Goto

Collaborating Authors

 Reinforcement Learning


A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

arXiv.org Machine Learning

We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view toward its use in mobile health. In the behavioral health communities there is increasing interest in, and use of, mobile devices to deliver treatments that target behavior change. Mobile devices can be used to provide treatment when, where, and in the amount desired (Litvin et al., 2013; Kumar et al., 2013). Increasingly scientists are looking to passive sensing (wearable devices, GPS, activity on the smartphone) and self-report of internal states to individualize the intervention to the person in terms of when, how and where to deliver treatment.


Putting AI in The Matrix May Keep It from Doing the Same to Us

#artificialintelligence

Someday artificial intelligence (AI) might be too good and too smart for humans. The worry is that the first AI machine to surpass human intelligence might be impossible to shut down. That's one reason Google made headlines in June with its big red button that relies on a modified reinforcement-learning algorithm that, under the right circumstances, will prevent AI from learning that the big red button deprives it of reward. Mark Riedl, associate professor in Georgia Tech's College of Computing and director of the Entertainment Intelligence Lab, is putting forward an alternate approach to the big red button that may prove to be more reliable in stopping AI from causing harm to people or property. The problem with a Big Red Button approach to shutting down AI that has gone rogue is that, over time, it's possible that AI may learn what big red buttons do.


GoAi #1: Asynchronous Methods for Deep Reinforcement Learning

#artificialintelligence

First, if you don't have the background about deep reinforcement learning, you can think of it as major algorithm behind AlphaGo. Therefore, authors provide asynchronous Methods for Deep Reinforcement Learning to overcome these drawbacks. Using CPU instead of GPU, we can open multi thread to run the same environment but share the same model weight. After reading the pseudocode, we find that there is little difference from original DQN algorithm. The special point is the line -- t mod Iasyncupdate.


The Multiworld Testing Decision Service « Machine Learning (Theory)

#artificialintelligence

We made a tool that you can use. Reinforcement learning is much discussed these days with successes like AlphaGo. Wouldn't it be great if Reinforcement Learning algorithms could easily be used to solve all reinforcement learning problems? But there is a well-known problem: It's very easy to create natural RL problems for which all standard RL algorithms (epsilon-greedy Q-learning, SARSA, etc…) fail catastrophically. That's a serious limitation which both inspires research and which I suspect many people need to learn the hard way.


Could Artificial Intelligence Learn How To Brew A Tasty Beer?

#artificialintelligence

Because we'll need something tasty to swill when our robot overlords finally come into their full artificial intelligence, a company in the UK is attempting to figure out if robots can help humans brew a better beer. While there won't be robots stirring batches of wort or sorting hops, artificial intelligence will play a big part in London-based firm IntelligentX's plan to brew beer, CNET reports. Here's how it'd work: consumers would try one of the company's four beers -- Amber AI, Black AI, Golden AI and Pale AI ---- and then weigh in via Facebook chat bot on the experience. That feedback will be fed to an algorithm called Automated Brewing Intelligence, or ABI, which will use the information to make changes to the next batch. Reinforcement learning and a process called bayesian decision making will teach the AI about the brewing experience.


Deep reinforcement learning for robotics - Artificial Intelligence 2016

@machinelearnbot

Pieter Abbeel is an associate professor in UC Berkeley's EECS department, where he works in machine learning and robotics--in particular his research is on making robots learn from people (apprenticeship learning) and how to make robots learn through their own trial and error (reinforcement learning). Pieter's robots have learned advanced helicopter aerobatics, knot tying, basic assembly, and organizing laundry. He has won various awards, including best paper awards at ICML and ICRA, the Sloan Fellowship, the Air Force Office of Scientific Research Young Investigator Program (AFOSR-YIP) Award, the Office of Naval Research Young Investigator Program (ONR-YIP) Award, the DARPA Young Faculty Award (DARPA-YFA), the National Science Foundation Faculty Early Career Development Program Award (NSF-CAREER), the Presidential Early Career Award for Scientists and Engineers (PECASE), the CRA-E Undergraduate Research Faculty Mentoring Award, the MIT TR35, the IEEE Robotics and Automation Society (RAS) Early Career Award, and the Dick Volz Best US PhD Thesis in Robotics and Automation Award.


Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization

arXiv.org Artificial Intelligence

We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true value function lies within a given hypothesis class, OCP selects optimal actions over all but at most K episodes, where K is the eluder dimension of the given hypothesis class. We establish further efficiency and asymptotic performance guarantees that apply even if the true value function does not lie in the given hypothesis class, for the special case where the hypothesis class is the span of pre-specified indicator functions over disjoint sets. We also discuss the computational complexity of OCP and present computational results involving two illustrative examples.


Deep Exploration via Bootstrapped DQN

arXiv.org Machine Learning

Efficient exploration remains a major challenge for reinforcement learning (RL). Common dithering strategies for exploration, such as ɛ-greedy, do not carry out temporally-extended (or deep) exploration; this can lead to exponentially larger data requirements. However, most algorithms for statistically efficient RL are not computationally tractable in complex environments. Randomized value functions offer a promising approach to efficient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions. As a first step towards addressing such contexts we develop bootstrapped DQN. We demonstrate that bootstrapped DQN can combine deep exploration with deep neural networks for exponentially faster learning than any dithering strategy. In the Arcade Learning Environment bootstrapped DQN substantially improves learning speed and cumulative performance across most games.


iWML 2016 2nd Indian Workshop on Machine Learning

@machinelearnbot

The 2nd Indian Workshop on Machine Learning (iWML) will organized by the Department of Computer Science and Engineering at the Indian Institute of Technology Kanpur (IITK), during July 1-3, 2016. This follows the inaugural edition of the workshop, held in 2013, which brought together several leading researchers and experts in machine learning and related areas from both the academia as well as the industry [link]. The second workshop seeks to further this effort and foster growth and excellence in the emerging machine learning community in India. We have significantly expanded the scope of the workshop, including talks and tutorials on modern and cutting edge topics such as reinforcement learning, non-convex optimization, deep learning, and contemporary applications of machine learning to medicine, social media, and vision. We hope attendees will benefit from the wide variety of topics being covered in the workshop, as well as the interaction with leading researchers.


Visualizing Dynamics: from t-SNE to SEMI-MDPs

arXiv.org Machine Learning

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots. While DRL agents perform well in practice we are still missing the tools to analayze their performance and visualize the temporal abstractions that they learn. In this paper, we present a novel method that automatically discovers an internal Semi Markov Decision Process (SMDP) model in the Deep Q Network's (DQN) learned representation. We suggest a novel visualization method that represents the SMDP model by a directed graph and visualize it above a t-SNE map. We show how can we interpret the agent's policy and give evidence for the hierarchical state aggregation that DQNs are learning automatically. Our algorithm is fully automatic, does not require any domain specific knowledge and is evaluated by a novel likelihood based evaluation criteria.