Goto

Collaborating Authors

 Reinforcement Learning


A Quick Q&A on (Deep) Reinforcement Learning – ROSS' #LegalTech Corner

#artificialintelligence

Jimoh Ovbiagele is the Chief Technology Officer & co-founder of ROSS Intelligence. He is a self-taught programmer, starting at the age of 10, who founded several startups in college and worked on self-driving cars. When he was 21, Jimoh came up with the idea for and co-founded ROSS Intelligence. Two years later, he was named by the American Bar Association as a Legal Rebel and by Forbes as one of their 30 Under 30. He speaks around the world -- from Canada to China -- about artificial intelligence and the future of law.


Prowler.io nabs $13M for its new approach to decision making in AI

#artificialintelligence

As we continue to see a wide proliferation of artificial intelligence-based startups come to market, one of the persistent and big questions has been is whether an AI system will ever be able to make decisions as well as a human can. A startup out of Cambridge called Prowler.io is developing a new kind of decision-making platform based on probabilistic modelling, reinforcement learning and game theory that it believes may help the AI community answer that question with a "yes." Co-founded by two alums from another AI company, VocalIQ, which was acquired by Apple 13 months after launch -- Prowler.io today is announcing that it has raised £10 million ($13 million) to help it along the way. Led by new investor Cambridge Innovation Capital, the round also had participation from Atlantic Bridge Capital as well as previous investors Passion Capital, Amadeus Capital Partners and SG Innovate, who participated in Prowler's $2 million seed round last year. Artificial intelligence has become an increasingly crowded area.


Linking Generative Adversarial Learning and Binary Classification

arXiv.org Machine Learning

In this note, we point out a basic link between generative adversarial (GA) training and binary classification -- any powerful discriminator essentially computes an (f-)divergence between real and generated samples. The result, repeatedly re-derived in decision theory, has implications for GA Networks (GANs), providing an alternative perspective on training f-GANs by designing the discriminator loss function.


Reinforcement Learning-based Thermal Comfort Control for Vehicle Cabins

arXiv.org Artificial Intelligence

Vehicle climate control systems aim to keep passengers thermally comfortable. However, current systems control temperature rather than thermal comfort and tend to be energy hungry, which is of particular concern when considering electric vehicles. This paper poses energy-efficient vehicle comfort control as a Markov Decision Process, which is then solved numerically using Sarsa({\lambda}) and an empirically validated, single-zone, 1D thermal model of the cabin. The resulting controller was tested in simulation using 200 randomly selected scenarios and found to exceed the performance of bang-bang, proportional, simple fuzzy logic, and commercial controllers with 23%, 43%, 40%, 56% increase, respectively. Compared to the next best performing controller, energy consumption is reduced by 13% while the proportion of time spent thermally comfortable is increased by 23%. These results indicate that this is a viable approach that promises to translate into substantial comfort and energy improvements in the car.


The successor representation in human reinforcement learning DeepMind

#artificialintelligence

Theories of reinforcement learning in neuroscience have focused on two families of algorithms. Model-based algorithms achieve flexibility at computational expense, by rebuilding values from a model of the environment. We examine an intermediate class of algorithms, the successor representation (SR), which caches long-run state expectancies, blending model-free efficiency with model-based flexibility. Although previous reward revaluation studies distinguish model-free from model-based learning algorithms, such designs cannot discriminate between model-based and SR-based algorithms, both of which predict sensitivity to reward revaluation. However, changing the transition structure ('transition revaluation') should selectively impair revaluation for the SR.


Reinforcement Learning Part 3 – Challenges & Considerations

@machinelearnbot

Summary: In the first part of this series we described the basics of Reinforcement Learning (RL). In this article we describe how deep learning is augmenting RL and a variety of challenges and considerations that need to be addressed in each implementation. In the first part of this series, Understanding Basic RL Models we described the basics of how reinforcement learning (RL) models are constructed and interpreted. RL systems can be constructed using policy gradient techniques which attempt to learn by directly mapping an observation to an action (the automated house look up table). Or they can be constructed using Q-Learning in which we train a neural net to calculate the estimated Q factor on the fly which is used when the state space gets large and complex.


Mean Actor Critic

arXiv.org Machine Learning

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. This significantly reduces variance in the gradient updates and removes the need for a variance reduction baseline. We show empirical results on two control domains where MAC performs as well as or better than other policy gradient approaches, and on five Atari games, where MAC is competitive with state-of-the-art policy search algorithms.


ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity?

arXiv.org Machine Learning

Generating molecules with desired chemical properties is important for drug discovery. The use of generative neural networks is promising for this task. However, from visual inspection, it often appears that generated samples lack diversity. In this paper, we quantify this internal chemical diversity, and we raise the following challenge: can a nontrivial AI model reproduce natural chemical diversity for desired molecules? To illustrate this question, we consider two generative models: a Reinforcement Learning model and the recently introduced ORGAN. Both fail at this challenge. We hope this challenge will stimulate research in this direction.


Asymptotic Bias of Stochastic Gradient Search

arXiv.org Machine Learning

The asymptotic behavior of the stochastic gradient algorithm with a biased gradient estimator is analyzed. Relying on arguments based on the dynamic system theory (chain-recurrence) and the differential geometry (Yomdin theorem and Lojasiewicz inequality), tight bounds on the asymptotic bias of the iterates generated by such an algorithm are derived. The obtained results hold under mild conditions and cover a broad class of high-dimensional nonlinear algorithms. Using these results, the asymptotic properties of the policy-gradient (reinforcement) learning and adaptive population Monte Carlo sampling are studied. Relying on the same results, the asymptotic behavior of the recursive maximum split-likelihood estimation in hidden Markov models is analyzed, too.


Salesforce is using AI to democratize SQL so anyone can query databases in natural language

@machinelearnbot

SQL is about as easy as it gets in the world of programming, and yet its learning curve is still steep enough to prevent many people from interacting with relational databases. Salesforce's AI research team took it upon itself to explore how machine learning might be able to open doors for those without knowledge of SQL. Their recent paper, Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning, builds on sequence to sequence models typically employed in machine translation. A reinforcement learning twist allowed the team to obtain promising results translating natural language database queries into SQL. In practice this means that you could simply ask who the winningest team in college football is and an appropriate database could be automatically queried to tell you that it is in fact the University of Michigan.