AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Learning values across many orders of magnitude

van Hasselt, Hado, Guez, Arthur, Hessel, Matteo, Mnih, Volodymyr, Silver, David

arXiv.org Artificial IntelligenceAug-16-2016

Most learning algorithms are not invariant to the scale of the function that is being approximated. We propose to adaptively normalize the targets used in learning. This is useful in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to play Atari games, where the rewards were all clipped to a predetermined range. This clipping facilitates learning across many different games with a single learning algorithm, but a clipped reward function can result in qualitatively different behavior. Using the adaptive normalization we can remove this domain-specific heuristic without diminishing overall performance.

machine learning, normalization, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1602.07714

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports (0.93)
Leisure & Entertainment > Games > Computer Games (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

What's the hello world program of reinforcement learning ? • /r/MachineLearning

#artificialintelligenceAug-15-2016, 12:45:48 GMT

Basically I want to have some hands on experience through small projects. You may want to start with OpenAI gym. Personally, I find the n-armed bandit problem quite illustrative. Here's a great blog post code to get you started. Definitely this, it is as simple as it gets.

artificial intelligence, machine learning, reinforcement learning, (4 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback

How deep reinforcement learning can help chatbots

#artificialintelligenceAug-14-2016, 22:45:15 GMT

In March this year, Microsoft CEO Satya Nadella talked about the industry trend of using human language more pervasively for interaction with computing devices, a trend he called "conversation as a platform." He also announced several bot initiatives, including the company's bot framework. In April, Facebook launched its Messenger platform with bots. Then, in May, Google announced its attempt to develop AI-powered bots, called Google Assistant. Since then, bots have been widely regarded as a new user interface (UI) to fundamentally change how computing will be experienced by people.

machine learning, natural language, reinforcement learning, (19 more...)

#artificialintelligence

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.91)

Add feedback

Apprenticeship learning using Inverse Reinforcement Learning

#artificialintelligenceAug-14-2016, 17:30:28 GMT

Reinforcement learning (RL) is is the very basic and most intuitive form of trial and error learning, it is the way by which most of the living organisms with some form of thinking capabilities learn. Often referred to as learning by exploration, it is the way by which a new born human baby learns to take its first steps, that is by taking random actions initially and then slowly figuring out the actions which lead to the forward walking motion. Note, this post assumes a good understanding of the Reinforcement learning framework, please make yourself familiar with RL through week 5 and 6 of this awesome online course AI_Berkeley. Now the question that I kept asking myself is, what is the driving force for this kind of learning, what forces the agent to learn a particular behavior in the way it is doing it. Upon learning more about RL I came across the idea of rewards, basically the agent tries to choose its actions in such a way that the rewards that is gets from that particular behavior are maximized.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

#artificialintelligence

Industry: Education > Educational Setting > Online (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Computation

Communications of the ACMAug-14-2016, 00:35:42 GMT

A Learning System Based on Genetic Adaptive Algorithms.

evolutionary algorithm, machine learning, reinforcement learning, (20 more...)

Communications of the ACM

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.15)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(8 more...)

Industry:

Banking & Finance > Trading (0.69)
Health & Medicine > Therapeutic Area (0.69)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
(2 more...)

Add feedback

Reinforcement Renaissance

Communications of the ACMAug-14-2016, 00:35:35 GMT

Based in San Francisco, Marina Krakovsky is the author of The Middleman Economy: How Brokers, Agents, Dealers, and Everyday Matchmakers Create Value and Profit (Palgrave Macmillan, 2015). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Communications of the ACM

Country:

North America > United States > California > San Francisco County > San Francisco (0.25)
North America > Canada > Alberta (0.15)
North America > United States > Tennessee (0.05)
(5 more...)

Industry:

Information Technology (0.71)
Leisure & Entertainment > Games > Computer Games (0.48)
Leisure & Entertainment > Games > Go (0.30)
Leisure & Entertainment > Games > Backgammon (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Q($\lambda$) with Off-Policy Corrections

Harutyunyan, Anna, Bellemare, Marc G., Stepleton, Tom, Munos, Remi

arXiv.org Machine LearningAug-11-2016

We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities. We prove that such approximate corrections are sufficient for off-policy convergence both in policy evaluation and control, provided certain conditions. These conditions relate the distance between the target and behavior policies, the eligibility trace parameter and the discount factor, and formalize an underlying tradeoff in off-policy TD($\lambda$). We illustrate this theoretical relationship empirically on a continuous-state control task.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1602.04951

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Posterior Sampling for Reinforcement Learning Without Episodes

Osband, Ian, Van Roy, Benjamin

arXiv.org Machine LearningAug-9-2016

This is a brief technical note to clarify some of the issues with applying the application of the algorithm posterior sampling for reinforcement learning (PSRL) in environments without fixed episodes. In particular, this paper aims to: - Review some of results which have been proven for finite horizon MDPs (Osband et al 2013, 2014a, 2014b, 2016) and also for MDPs with finite ergodic structure (Gopalan et al 2014). - Review similar results for optimistic algorithms in infinite horizon problems (Jaksch et al 2010, Bartlett and Tewari 2009, Abbasi-Yadkori and Szepesvari 2011), with particular attention to the dynamic episode growth. - Highlight the delicate technical issue which has led to a fault in the proof of the lazy-PSRL algorithm (Abbasi-Yadkori and Szepesvari 2015). We present an explicit counterexample to this style of argument. Therefore, we suggest that the Theorem 2 in (Abbasi-Yadkori and Szepesvari 2015) be instead considered a conjecture, as it has no rigorous proof. - Present pragmatic approaches to apply PSRL in infinite horizon problems. We conjecture that, under some additional assumptions, it will be possible to obtain bounds $O( \sqrt{T} )$ even without episodic reset. We hope that this note serves to clarify existing results in the field of reinforcement learning and provides interesting motivation for future work.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1608.02731

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.37)

Add feedback

On Lower Bounds for Regret in Reinforcement Learning

Osband, Ian, Van Roy, Benjamin

arXiv.org Machine LearningAug-9-2016

This is a brief technical note to clarify the state of lower bounds on regret for reinforcement learning. In particular, this paper: - Reproduces a lower bound on regret for reinforcement learning, similar to the result of Theorem 5 in the journal UCRL2 paper (Jaksch et al 2010). - Clarifies that the proposed proof of Theorem 6 in the REGAL paper (Bartlett and Tewari 2009) does not hold using the standard techniques without further work. We suggest that this result should instead be considered a conjecture as it has no rigorous proof. - Suggests that the conjectured lower bound given by (Bartlett and Tewari 2009) is incorrect and, in fact, it is possible to improve the scaling of the upper bound to match the weaker lower bounds presented in this paper. We hope that this note serves to clarify existing results in the field of reinforcement learning and provides interesting motivation for future work.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1608.02732

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Accelerating Stochastic Composition Optimization

Wang, Mengdi, Liu, Ji, Fang, Ethan X.

arXiv.org Machine LearningJul-25-2016

Consider the stochastic composition optimization problem where the objective is a composition of two expected-value functions. We propose a new stochastic first-order method, namely the accelerated stochastic compositional proximal gradient (ASC-PG) method, which updates based on queries to the sampling oracle using two different timescales. The ASC-PG is the first proximal gradient method for the stochastic composition problem that can deal with nonsmooth regularization penalty. We show that the ASC-PG exhibits faster convergence than the best known algorithms, and that it achieves the optimal sample-error complexity in several important special cases. We further demonstrate the application of ASC-PG to reinforcement learning and conduct numerical experiments.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1607.07329

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback