Goto

Collaborating Authors

 Reinforcement Learning


Learning Monopoly Gameplay: A Hybrid Model-Free Deep Reinforcement Learning and Imitation Learning Approach

arXiv.org Artificial Intelligence

Learning how to adapt and make real-time informed decisions in dynamic and complex environments is a challenging problem. To learn this task, Reinforcement Learning (RL) relies on an agent interacting with an environment and learning through trial and error to maximize the cumulative sum of rewards received by it. In multi-player Monopoly game, players have to make several decisions every turn which involves complex actions, such as making trades. This makes the decision-making harder and thus, introduces a highly complicated task for an RL agent to play and learn its winning strategies. In this paper, we introduce a Hybrid Model-Free Deep RL (DRL) approach that is capable of playing and learning winning strategies of the popular board game, Monopoly. To achieve this, our DRL agent (1) starts its learning process by imitating a rule-based agent (that resembles the human logic) to initialize its policy, (2) learns the successful actions, and improves its policy using DRL. Experimental results demonstrate an intelligent behavior of our proposed agent as it shows high win rates against different types of agent-players.


Where the Action is: Let's make Reinforcement Learning for Stochastic Dynamic Vehicle Routing Problems work!

arXiv.org Artificial Intelligence

There has been a paradigm-shift in urban logistic services in the last years; demand for real-time, instant mobility and delivery services grows. This poses new challenges to logistic service providers as the underlying stochastic dynamic vehicle routing problems (SDVRPs) require anticipatory real-time routing actions. Searching the combinatorial action space for efficient routing actions is by itself a complex task of mixed-integer programming (MIP) well-known by the operations research community. This complexity is now multiplied by the challenge of evaluating such actions with respect to their effectiveness given future dynamism and uncertainty, a potentially ideal case for reinforcement learning (RL) well-known by the computer science community. For solving SDVRPs, joint work of both communities is needed, but as we show, essentially non-existing. Both communities focus on their individual strengths leaving potential for improvement. Our survey paper highlights this potential in research originating from both communities. We point out current obstacles in SDVRPs and guide towards joint approaches to overcome them.


Ensemble Bootstrapping for Q-Learning

arXiv.org Artificial Intelligence

Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by utilizing two estimators, yet results in an under-estimation bias. Similar to over-estimation in Q-learning, in certain scenarios, the under-estimation bias may degrade performance. In this work, we introduce a new bias-reduced algorithm called Ensemble Bootstrapped Q-Learning (EBQL), a natural extension of Double-Q-learning to ensembles. We analyze our method both theoretically and empirically. Theoretically, we prove that EBQL-like updates yield lower MSE when estimating the maximal mean of a set of independent random variables. Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance. Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games.


Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach

arXiv.org Artificial Intelligence

Reliable automatic evaluation of dialogue systems under an interactive environment has long been overdue. An ideal environment for evaluating dialog systems, also known as the Turing test, needs to involve human interaction, which is usually not affordable for large-scale experiments. Though researchers have attempted to use metrics (e.g., perplexity, BLEU) in language generation tasks or some model-based reinforcement learning methods (e.g., self-play evaluation) for automatic evaluation, these methods only show a very weak correlation with the actual human evaluation in practice. To bridge such a gap, we propose a new framework named ENIGMA for estimating human evaluation scores based on recent advances of off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation, making automatic evaluations feasible. More importantly, ENIGMA is model-free and agnostic to the behavior policies for collecting the experience data (see details in Section 2), which significantly alleviates the technical difficulties of modeling complex dialogue environments and human behaviors. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.


Advanced AI: Deep Reinforcement Learning in Python

#artificialintelligence

Free Coupon Discount - Advanced AI: Deep Reinforcement Learning in Python The Complete Guide to Mastering Artificial Intelligence using Deep Learning and Neural Networks | Created by Lazy Programmer Team, Lazy Programmer Inc. Students also bought Artificial Intelligence: Reinforcement Learning in Python Data Science: Natural Language Processing (NLP) in Python Unsupervised Machine Learning Hidden Markov Models in Python Cluster Analysis and Unsupervised Machine Learning in Python Complete Python Bootcamp: Go from zero to hero in Python 3 Preview this Udemy Course GET COUPON CODE 100% Off Udemy Coupon . Free Udemy Courses . Online Classes


Cutting-Edge AI: Deep Reinforcement Learning in Python

#artificialintelligence

Free Coupon Discount - Cutting-Edge AI: Deep Reinforcement Learning in Python, Apply deep learning to artificial intelligence and reinforcement learning using evolution strategies, A2C, and DDPG Highest Rated Created by Lazy Programmer Inc. Preview this Udemy Course GET COUPON CODE Description Welcome to Cutting-Edge AI! This is technically Deep Learning in Python part 11 of my deep learning series, and my 3rd reinforcement learning course. Deep Reinforcement Learning is actually the combination of 2 topics: Reinforcement Learning and Deep Learning (Neural Networks). While both of these have been around for quite some time, it's only been recently that Deep Learning has really taken off, and along with it, Reinforcement Learning. The maturation of deep learning has propelled advances in reinforcement learning, which has been around since the 1980s, although some aspects of it, such as the Bellman equation, have been for much longer.


Patenting Algorithm based Innovation: Best Practice Attorney

#artificialintelligence

Common patent objection #methodpatentclaims executing steps as which are a set of a predefined sequence of steps used to implement an #algorithm, without disclosing any functional limitations pertaining to enablement of features as claimed in form of method steps. In a world where terms and conditions appear everywhere, we can't help but be suspicious about'catches' that exist within particular clauses or how we might set ourselves up for trouble by possibly agreeing to something. Thereby it is imperative to understand legal language as to how to draft patent application which will withstand the objections raised by patent examiner. Patent simply is a kind of IPRs and in AI one important subject where research has gained momentum is Deep reinforcement modules. Scientists globally are working on Deep reinforcement learning.


All You Need to Know about Reinforcement Learning

#artificialintelligence

Reinforcement learning (RL) is the area of machine learning that is concerned with how software is able to take the right decision.


An Introduction to Deep Reinforcement Learning and its Significance - Fingent Technology

#artificialintelligence

RL algorithms can be used to solve tasks where automation is required. However actual implementation is easier said than done. You can ease your pain by using TF-Agents, a flexible library for TensorFlow to build reinforcement learning models. TF-Agents makes it easy to use reinforced learning for TensorFlow. TF-Agents enables newbies to learn RL using Colabs, documentation, and examples as well as researchers who want to build new RL algorithms.


Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

arXiv.org Machine Learning

Designing off-policy reinforcement learning algorithms is typically a very challenging task, because a desirable iteration update often involves an expectation over an on-policy distribution. Prior off-policy actor-critic (AC) algorithms have introduced a new critic that uses the density ratio for adjusting the distribution mismatch in order to stabilize the convergence, but at the cost of potentially introducing high biases due to the estimation errors of both the density ratio and value function. In this paper, we develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP, which can take advantage of learned nuisance functions to reduce estimation errors. Moreover, DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize, and is thus more sample efficient than prior algorithms that adopt either two timescale or nested-loop structure. We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $\epsilon$-accurate optimal policy. We also show that the overall convergence of DR-Off-PAC is doubly robust to the approximation errors that depend only on the expressive power of approximation functions. To the best of our knowledge, our study establishes the first overall sample complexity analysis for a single time-scale off-policy AC algorithm.