AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

A Quick Q&A on (Deep) Reinforcement Learning – ROSS' #LegalTech Corner

#artificialintelligenceSep-6-2017, 15:35:18 GMT

Jimoh Ovbiagele is the Chief Technology Officer & co-founder of ROSS Intelligence. He is a self-taught programmer, starting at the age of 10, who founded several startups in college and worked on self-driving cars. When he was 21, Jimoh came up with the idea for and co-founded ROSS Intelligence. Two years later, he was named by the American Bar Association as a Legal Rebel and by Forbes as one of their 30 Under 30. He speaks around the world -- from Canada to China -- about artificial intelligence and the future of law.

machine learning, reinforcement, reinforcement learning, (12 more...)

#artificialintelligence

Country:

North America > Canada (0.26)
Asia > China (0.26)

Genre: Personal (0.37)

Industry:

Law (0.93)
Transportation (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Prowler.io nabs $13M for its new approach to decision making in AI

#artificialintelligenceSep-5-2017, 22:25:03 GMT

As we continue to see a wide proliferation of artificial intelligence-based startups come to market, one of the persistent and big questions has been is whether an AI system will ever be able to make decisions as well as a human can. A startup out of Cambridge called Prowler.io is developing a new kind of decision-making platform based on probabilistic modelling, reinforcement learning and game theory that it believes may help the AI community answer that question with a "yes." Co-founded by two alums from another AI company, VocalIQ, which was acquired by Apple 13 months after launch -- Prowler.io today is announcing that it has raised £10 million ($13 million) to help it along the way. Led by new investor Cambridge Innovation Capital, the round also had participation from Atlantic Bridge Capital as well as previous investors Passion Capital, Amadeus Capital Partners and SG Innovate, who participated in Prowler's $2 million seed round last year. Artificial intelligence has become an increasingly crowded area.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.36)

Add feedback

Linking Generative Adversarial Learning and Binary Classification

Balsubramani, Akshay

arXiv.org Machine LearningSep-5-2017

In this note, we point out a basic link between generative adversarial (GA) training and binary classification -- any powerful discriminator essentially computes an (f-)divergence between real and generated samples. The result, repeatedly re-derived in decision theory, has implications for GA Networks (GANs), providing an alternative perspective on training f-GANs by designing the discriminator loss function.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1709.01509

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

Add feedback

Reinforcement Learning-based Thermal Comfort Control for Vehicle Cabins

Brusey, James, Hintea, Diana, Gaura, Elena, Beloe, Neil

arXiv.org Artificial IntelligenceSep-5-2017

Vehicle climate control systems aim to keep passengers thermally comfortable. However, current systems control temperature rather than thermal comfort and tend to be energy hungry, which is of particular concern when considering electric vehicles. This paper poses energy-efficient vehicle comfort control as a Markov Decision Process, which is then solved numerically using Sarsa({\lambda}) and an empirically validated, single-zone, 1D thermal model of the cabin. The resulting controller was tested in simulation using 200 randomly selected scenarios and found to exceed the performance of bang-bang, proportional, simple fuzzy logic, and commercial controllers with 23%, 43%, 40%, 56% increase, respectively. Compared to the next best performing controller, energy consumption is reduced by 13% while the proportion of time spent thermally comfortable is increased by 23%. These results indicate that this is a viable approach that promises to translate into substantial comfort and energy improvements in the car.

controller, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

1704.07899

Country: Europe > United Kingdom (0.46)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Energy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

The successor representation in human reinforcement learning DeepMind

#artificialintelligenceSep-2-2017, 17:05:20 GMT

Theories of reinforcement learning in neuroscience have focused on two families of algorithms. Model-based algorithms achieve flexibility at computational expense, by rebuilding values from a model of the environment. We examine an intermediate class of algorithms, the successor representation (SR), which caches long-run state expectancies, blending model-free efficiency with model-based flexibility. Although previous reward revaluation studies distinguish model-free from model-based learning algorithms, such designs cannot discriminate between model-based and SR-based algorithms, both of which predict sensitivity to reward revaluation. However, changing the transition structure ('transition revaluation') should selectively impair revaluation for the SR.

large language model, machine learning, reinforcement learning, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Reinforcement Learning Part 3 – Challenges & Considerations

@machinelearnbotSep-2-2017, 04:55:10 GMT

Summary: In the first part of this series we described the basics of Reinforcement Learning (RL). In this article we describe how deep learning is augmenting RL and a variety of challenges and considerations that need to be addressed in each implementation. In the first part of this series, Understanding Basic RL Models we described the basics of how reinforcement learning (RL) models are constructed and interpreted. RL systems can be constructed using policy gradient techniques which attempt to learn by directly mapping an observation to an action (the automated house look up table). Or they can be constructed using Q-Learning in which we train a neural net to calculate the estimated Q factor on the fly which is used when the state space gets large and complex.

machine learning, reinforcement learning, rl system, (17 more...)

@machinelearnbot

Industry:

Leisure & Entertainment > Games (0.97)
Transportation > Ground > Road (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Mean Actor Critic

Asadi, Kavosh, Allen, Cameron, Roderick, Melrose, Mohamed, Abdel-rahman, Konidaris, George, Littman, Michael

arXiv.org Machine LearningSep-1-2017

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. This significantly reduces variance in the gradient updates and removes the need for a variance reduction baseline. We show empirical results on two control domains where MAC performs as well as or better than other policy gradient approaches, and on five Atari games, where MAC is competitive with state-of-the-art policy search algorithms.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1709.00503

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity?

Benhenda, Mostapha

arXiv.org Machine LearningAug-31-2017

Generating molecules with desired chemical properties is important for drug discovery. The use of generative neural networks is promising for this task. However, from visual inspection, it often appears that generated samples lack diversity. In this paper, we quantify this internal chemical diversity, and we raise the following challenge: can a nontrivial AI model reproduce natural chemical diversity for desired molecules? To illustrate this question, we consider two generative models: a Reinforcement Learning model and the recently introduced ORGAN. Both fail at this challenge. We hope this challenge will stimulate research in this direction.

machine learning, natural language, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1708.08227

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Asymptotic Bias of Stochastic Gradient Search

Tadic, Vladislav B., Doucet, Arnaud

arXiv.org Machine LearningAug-30-2017

The asymptotic behavior of the stochastic gradient algorithm with a biased gradient estimator is analyzed. Relying on arguments based on the dynamic system theory (chain-recurrence) and the differential geometry (Yomdin theorem and Lojasiewicz inequality), tight bounds on the asymptotic bias of the iterates generated by such an algorithm are derived. The obtained results hold under mild conditions and cover a broad class of high-dimensional nonlinear algorithms. Using these results, the asymptotic properties of the policy-gradient (reinforcement) learning and adaptive population Monte Carlo sampling are studied. Relying on the same results, the asymptotic behavior of the recursive maximum split-likelihood estimation in hidden Markov models is analyzed, too.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1709.00291

Country: Europe > United Kingdom > England (0.45)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Salesforce is using AI to democratize SQL so anyone can query databases in natural language

@machinelearnbotAug-29-2017, 20:15:15 GMT

SQL is about as easy as it gets in the world of programming, and yet its learning curve is still steep enough to prevent many people from interacting with relational databases. Salesforce's AI research team took it upon itself to explore how machine learning might be able to open doors for those without knowledge of SQL. Their recent paper, Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning, builds on sequence to sequence models typically employed in machine translation. A reinforcement learning twist allowed the team to obtain promising results translating natural language database queries into SQL. In practice this means that you could simply ask who the winningest team in college football is and an appropriate database could be automatically queried to tell you that it is in fact the University of Michigan.

machine learning, natural language, reinforcement learning, (10 more...)

@machinelearnbot

Country: North America > United States > Michigan (0.26)

Industry: Information Technology > Software (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.41)

Add feedback