AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Artificial Intelligence Masterclass

#artificialintelligenceJun-28-2020, 09:09:23 GMT

Online Courses Udemy Enter the new era of Hybrid AI Models optimized by Deep NeuroEvolution, with a complete toolkit of ML, DL & AI models Created by Hadelin de Ponteves, Kirill Eremenko, SuperDataScience Team English, Italian [Auto-generated] Students also bought Artificial Intelligence: Reinforcement Learning in Python Machine Learning and AI: Support Vector Machines in Python Advanced AI: Deep Reinforcement Learning in Python Ensemble Machine Learning in Python: Random Forest, AdaBoost Deep Learning: Advanced Computer Vision (GANs, SSD, More!) Preview this course GET COUPON CODE Description Today, we are bringing you the king of our AI courses...: The Artificial Intelligence MASTERCLASS Are you keen on Artificial Intelligence? Do want to learn to build the most powerful AI model developed so far and even play against it? Sounds tempting right... Then Artificial Intelligence Masterclass course is the right choice for you. This ultimate AI toolbox is all you need to nail it down with ease. You will get 10 hours step by step guide and the full roadmap which will help you build your own Hybrid AI Model from scratch.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

#artificialintelligence

Genre:

Instructional Material > Course Syllabus & Notes (0.54)
Instructional Material > Training Manual (0.37)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.76)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

Exploration Strategies in Deep Reinforcement Learning

#artificialintelligenceJun-28-2020, 08:24:27 GMT

Exploitation versus exploration is a critical topic in reinforcement learning. This post introduces several common approaches for better exploration in Deep RL. Exploitation versus exploration is a critical topic in Reinforcement Learning. We'd like the RL agent to find the best solution as fast as possible. However, in the meantime, committing to solutions too quickly without enough exploration sounds pretty bad, as it could lead to local minima or total failure. Modern RL algorithms that optimize for the best returns can achieve good exploitation quite efficiently, while exploration remains more like an open topic. I would like to discuss several common exploration strategies in Deep RL here. As this is a very big topic, my post by no means can cover all the important subtopics. I plan to update it periodically and keep further enriching the content gradually in time. As a quick recap, let's first go through several classic exploration algorithms that work out pretty well in the multi-armed bandit problem or simple tabular RL. Good exploration becomes especially hard when the environment rarely provides rewards as feedback or the environment has distracting noise.

computer game, exploration, upstream oil & gas, (21 more...)

#artificialintelligence

Industry:

Energy > Oil & Gas > Upstream (0.61)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Software development in Python: A practical approach

#artificialintelligenceJun-28-2020, 07:54:15 GMT

Online Courses Udemy - Software development in Python: A practical approach Learn to build real apps with python NEW Created by Daniel IT English [Auto] Students also bought Data Science: Deep Learning in Python Advanced AI: Deep Reinforcement Learning in Python Deep Learning Prerequisites: Linear Regression in Python Unsupervised Machine Learning Hidden Markov Models in Python 2020 Complete Python Bootcamp: From zero to hero in Python Preview this course GET COUPON CODE Description The reason I got into python, I wanted to be a software engineer, I had just built a chat app in PHP and JQuery and a girl asked me if it could run on phone. I responded yes, but I knew that would only be possible using help using non-native means. I wanted native builds, not some complex framework which will only allow me to make a web app whiles I could use the time to study a full fledge programming language. There were others like making a web view app, I didn't like the Idea because there would definetely be setbacks. And I also wanted to be a software engineer or developer, I had built two almost identical CMSs with PHP and I felt I was ready to move into the software development space.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.35)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning

Blondé, Lionel, Strasser, Pablo, Kalousis, Alexandros

arXiv.org Artificial IntelligenceJun-28-2020

Despite the recent success of reinforcement learning in various domains, these approaches remain, for the most part, deterringly sensitive to hyper-parameters and are often riddled with essential engineering feats allowing their success. We consider the case of off-policy generative adversarial imitation learning, and perform an in-depth review, qualitative and quantitative, of the method. Crucially, we show that forcing the learned reward function to be local Lipschitz-continuous is a sine qua non condition for the method to perform well. We then study the effects of this necessary condition and provide several theoretical results involving the local Lipschitzness of the state-value function. Finally, we propose a novel reward-modulation technique inspired from a new interpretation of gradient-penalty regularization in reinforcement learning. Besides being extremely easy to implement and bringing little to no overhead, we show that our method provides improvements in several continuous control environments of the MuJoCo suite.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2006.16785

Country:

North America > United States (0.04)
Europe > Switzerland (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.45)

Industry:

Leisure & Entertainment > Games (0.67)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Empirically Verifying Hypotheses Using Reinforcement Learning

Marino, Kenneth, Fergus, Rob, Szlam, Arthur, Gupta, Abhinav

arXiv.org Artificial IntelligenceJun-28-2020

This paper formulates hypothesis verification as an RL problem. Specifically, we aim to build an agent that, given a hypothesis about the dynamics of the world, can take actions to generate observations which can help predict whether the hypothesis is true or false. Existing RL algorithms fail to solve this task, even for simple environments. In order to train the agents, we exploit the underlying structure of many hypotheses, factorizing them as {pre-condition, action sequence, post-condition} triplets. By leveraging this structure we show that RL agents are able to succeed at the task. Furthermore, subsequent fine-tuning of the policies allows the agent to correctly verify hypotheses not amenable to the above factorization.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2006.15762

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.50)
Workflow (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Lookahead-Bounded Q-Learning

Shar, Ibrahim El, Jiang, Daniel R.

arXiv.org Machine LearningJun-28-2020

We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new, provably convergent variant of Q-learning that seeks to improve the performance of standard Q-learning in stochastic environments through the use of ``lookahead'' upper and lower bounds. To do this, LBQL employs previously collected experience and each iteration's state-action values as dual feasible penalties to construct a sequence of sampled information relaxation problems. The solutions to these problems provide estimated upper and lower bounds on the optimal value, which we track via stochastic approximation. These quantities are then used to constrain the iterates to stay within the bounds at every iteration. Numerical experiments on benchmark problems show that LBQL exhibits faster convergence and more robustness to hyperparameters when compared to standard Q-learning and several related techniques. Our approach is particularly appealing in problems that require expensive simulations or real-world interactions.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2006.1569

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization without Compounding Errors

Zhang, Chi, Kuppannagari, Sanmukh Rao, Prasanna, Viktor K

arXiv.org Machine LearningJun-28-2020

Model usage is the central challenge of model-based reinforcement learning. Although dynamics model based on deep neural networks provide good generalization for single step prediction, such ability is over exploited when it is used to predict long horizon trajectories due to compounding errors. In this work, we propose a Dyna-style model-based reinforcement learning algorithm, which we called Maximum Entropy Model Rollouts (MEMR). To eliminate the compounding errors, we only use our model to generate single-step rollouts. Furthermore, we propose to generate \emph{diverse} model rollouts by non-uniform sampling of the environment states such that the entropy of the model rollouts is maximized. We mathematically derived the maximum entropy sampling criteria for one data case under Gaussian prior. To accomplish this criteria, we propose to utilize a prioritized experience replay. Our preliminary experiments in challenging locomotion benchmarks show that our approach achieves the same sample efficiency of the best model-based algorithms, matches the asymptotic performance of the best model-free algorithms, and significantly reduces the computation requirements of other model-based methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2006.04802

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Austria > Vienna (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.84)

Add feedback

Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples

Xu, Zhe, Wu, Bo, Neider, Daniel, Topcu, Ufuk

arXiv.org Artificial IntelligenceJun-28-2020

Despite the fact that deep reinforcement learning (RL) has surpassed human-level performances in various tasks, it still has several fundamental challenges such as extensive data requirement and lack of interpretability. We investigate the RL problem with non-Markovian reward functions to address such challenges. We enable an RL agent to extract high-level knowledge in the form of finite reward automata, a type of Mealy machines that encode non-Markovian reward functions. The finite reward automata can be converted to deterministic finite state machines, which can be further translated to regular expressions. Thus, this representation is more interpretable than other forms of knowledge representation such as neural networks. We propose an active learning approach that iteratively infers finite reward automata and performs RL (specifically, q-learning) based on the inferred finite reward automata. The inference method is inspired by the L* learning algorithm, and modified in the framework of RL. We maintain two different q-functions, one for answering the membership queries in the L* learning algorithm and the other one for obtaining optimal policies for the inferred finite reward automaton. The experiments show that the proposed approach converges to optimal policies in at most 50% of the training steps as in the two state-of-the-art baselines.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2006.15714

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

A deep reinforcement learning framework to identify key players in complex networks

#artificialintelligenceJun-27-2020, 18:25:24 GMT

Network science is an academic field that aims to unveil the structure and dynamics behind networks, such as telecommunication, computer, biological and social networks. One of the fundamental problems that network scientists have been trying to solve in recent years entails identifying an optimal set of nodes that most influence a network's functionality, referred to as key players. Identifying key players could greatly benefit many real-world applications, for instance, enhancing techniques for the immunization of networks, as well as aiding epidemic control, drug design and viral marketing. Due to its NP-hard nature, however, solving this problem using exact algorithms with polynomial time complexity has proved highly challenging. Researchers at National University of Defense Technology in China, University of California, Los Angeles (UCLA), and Harvard Medical School (HMS) have recently developed a deep reinforcement learning (DRL) framework, dubbed FINDER, that could identify key players in complex networks more efficiently.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.55)
Asia > China (0.25)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Building AI Trading Systems

#artificialintelligenceJun-27-2020, 09:50:42 GMT

About two years ago I wrote a little piece about applying Reinforcement Learning to the markets. It was a project I had worked on for a while in various forms. A few people asked me what became of it. So this post covers some high-level things I've learned. It's more of a rant than an organized post, really. If there is enough interest in this topic I'd be happy to go into more technical detail in future posts, but that's TBD.

infrastructure, machine learning, reinforcement learning, (14 more...)

#artificialintelligence

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback