Goto

Collaborating Authors

 Reinforcement Learning


Episodic Multi-armed Bandits

arXiv.org Machine Learning

We introduce a new class of reinforcement learning methods referred to as {\em episodic multi-armed bandits} (eMAB). In eMAB the learner proceeds in {\em episodes}, each composed of several {\em steps}, in which it chooses an action and observes a feedback signal. Moreover, in each step, it can take a special action, called the $stop$ action, that ends the current episode. After the $stop$ action is taken, the learner collects a terminal reward, and observes the costs and terminal rewards associated with each step of the episode. The goal of the learner is to maximize its cumulative gain (i.e., the terminal reward minus costs) over all episodes by learning to choose the best sequence of actions based on the feedback. First, we define an {\em oracle} benchmark, which sequentially selects the actions that maximize the expected immediate gain. Then, we propose our online learning algorithm, named {\em FeedBack Adaptive Learning} (FeedBAL), and prove that its regret with respect to the benchmark is bounded with high probability and increases logarithmically in expectation. Moreover, the regret only has polynomial dependence on the number of steps, actions and states. eMAB can be used to model applications that involve humans in the loop, ranging from personalized medical screening to personalized web-based education, where sequences of actions are taken in each episode, and optimal behavior requires adapting the chosen actions based on the feedback.


Soft-Robust Actor-Critic Policy-Gradient

arXiv.org Machine Learning

Robust Reinforcement Learning aims to derive an optimal behavior that accounts for model uncertainty in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust policies can be overly conservative. Our \textit{soft-robust} framework is an attempt to overcome this issue. In this paper, we present a novel Soft-Robust Actor-Critic algorithm (SR-AC). It learns an optimal policy with respect to a distribution over an uncertainty set and stays robust to model uncertainty but avoids the conservativeness of robust strategies. We show convergence of the SR-AC and test the efficiency of our approach on different domains by comparing it against regular learning methods and their robust formulations.


On machine learning and structure for s driverless cars /s mobile robots

@machinelearnbot

The post coincides topically with last years' first annual Conference on Robot Learning as well as the workshop on Challenges in Robot Learning at NIPS2017, the latter we had the pleasure of co-organising together with colleagues from Oxford, DeepMind, and MIT. The events, as well as this post, cover current challenges and potentials of learning across various tasks of relevance in robotics and automation. In this context, similar to the long-term discussion on how much innate structure is optimal for artificial general intelligence, there is the more short-term question of how to merge traditional programming and learning (not sure if I prefer the branding as differentiable programming or software 2.0) for more narrow applications in efficient, robust and safe automation. The question about structure as beneficial or limiting aspect becomes arguably easier to answer in the context of robotic near-term applications as we can simply acknowledge our ignorance (our missing knowledge about what will work best in the future) and focus on the present to benchmark and combine the most efficient and effective directions. Existing solutions to many tasks in mobile robotics, such as localisation, mapping, or planning, focus on prior knowledge about the structure of our tasks and environments. This may include geometry or kinematic and dynamic models, which therefore have been built into traditional programs. However, recent successes and the flexibility of fairly unconstrained, learned models shift the focus of new academic and industrial projects. Successes in image recognition (ImageNet) as well as triumphs in reinforcement learning (Atari, Go, Chess) inspire like-minded research. As the post has become a bit of a long read, I suggest to read it like a paper: intro, discussion & conclusions and then - only if you did not fall asleep after all - the rest. Similar to scientific papers, some paragraphs will require basic familiarity with the field. However, a coarse web search should be enough to illustrate most unexplained terminology. Additionally, to keep this engaging, I have added some of my favourite recent videos highlighting interesting research for each section. Finally, this is a high-level review with more details to be found in the respective references, which just represent a small subset of available work in each field, chosen based on personal interest as well as shameless self-promotion of our work.


Focus on a reinforcement learning algorithm that can learn from failure

#artificialintelligence

Recent news from the OpenAI people is all about a bonus trio. They are releasing new Gym environments--a set of simulated robotics environments based on real robot platforms--including a Shadow hand and a Fetch research robot, said IEEE Spectrum. In addition to that toolkit, they are releasing an open source version of Hindsight Experience Replay (HER). As its name suggests, it helps robots learn from hindsight, for goals-based robotic tasks. Last but not least, they released a set of requests for robotics research.


July 2017 – RealThinks

#artificialintelligence

Today we're going to be learning about reinforcement learning. The ultimate goal of this endeavor is to create an artificial intelligence that is a strong Othello player, and can teach you how to become stronger yourself. I explained the rules of Othello, my motivation, and how to create a playable game in Step 1 of this series. I created some basic artificial intelligence in Step 2 of this series. The next thing I want to do is to use machine learning to create an even better artificial intelligence, but before I can even do that, I need to learn how to implement reinforcement learning. In many machine learning aspects, the cause/effect relationship is fairly obvious: A house with X square feet, Y bedrooms, and Z bathrooms in neighborhood Q will cost, on average N dollars.


Landing a SpaceX Falcon Heavy Rocket - YouTube

#artificialintelligence

Can we land a SpaceX Falcon Heavy Rocket in simulation using machine learning? Yes! Reinforcement learning is a technique that lets an agent learn how best to act in an environment using rewards as its signal. OpenAI released a library called Gym that lets us train AI agents really easily. We'll use a combination of the Tensorflow and gym libraries to build an RL agent capable of landing a rocket perfectly. The specific technique we're using is called proximal policy optimization, this is an actor-critic algorithm that is really popular.


Generalization Properties of Doubly Stochastic Learning Algorithms

arXiv.org Machine Learning

Doubly stochastic learning algorithms are scalable kernel methods that perform very well in practice. However, their generalization properties are not well understood and their analysis is challenging since the corresponding learning sequence may not be in the hypothesis space induced by the kernel. In this paper, we provide an in-depth theoretical analysis for different variants of doubly stochastic learning algorithms within the setting of nonparametric regression in a reproducing kernel Hilbert space and considering the square loss. Particularly, we derive convergence results on the generalization error for the studied algorithms either with or without an explicit penalty term. To the best of our knowledge, the derived results for the unregularized variants are the first of this kind, while the results for the regularized variants improve those in the literature. The novelties in our proof are a sample error bound that requires controlling the trace norm of a cumulative operator, and a refined analysis of bounding initial error.


AI Just Took a Big Step Towards Becoming More Human

#artificialintelligence

In recent months, researchers at OpenAI have been focusing on developing artificial intelligence (AI) that learns better. Their machine learning algorithms are now capable of training themselves, so to speak, thanks to the reinforcement learning methods of their OpenAI Baselines. Now, a new algorithm lets their AI learn from its own mistakes, almost as human beings do. The development comes from a new open-source algorithm called Hindsight Experience Replay (HER), which OpenAI researchers released earlier this week. As its name suggests, HER helps an AI agent "look back" in hindsight, so to speak, as it completes a task.



Reinforcement Learning Cheat Sheet – Towards Data Science

#artificialintelligence

Disclaimer: This is a work in progress project there may be errors! In order to fast recap my knowledge of Reinforcement Learning, I created this Cheat Sheet with all the basic formulas and algorithms. I hope this may be useful to you. You can find the full pdf here, and the repo here.