AITopics

Learning how to adapt and make real-time informed decisions in dynamic and complex environments is a challenging problem. To learn this task, Reinforcement Learning (RL) relies on an agent interacting with an environment and learning through trial and error to maximize the cumulative sum of rewards received by it. In multi-player Monopoly game, players have to make several decisions every turn which involves complex actions, such as making trades. This makes the decision-making harder and thus, introduces a highly complicated task for an RL agent to play and learn its winning strategies. In this paper, we introduce a Hybrid Model-Free Deep RL (DRL) approach that is capable of playing and learning winning strategies of the popular board game, Monopoly. To achieve this, our DRL agent (1) starts its learning process by imitating a rule-based agent (that resembles the human logic) to initialize its policy, (2) learns the successful actions, and improves its policy using DRL. Experimental results demonstrate an intelligent behavior of our proposed agent as it shows high win rates against different types of agent-players.

agent, monopoly, rule-based agent, (15 more...)

2103.00683

Country:

North America > United States > California (0.14)
North America > United States > Vermont (0.04)
North America > United States > Connecticut (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Hildebrandt, Florentin D, Thomas, Barrett, Ulmer, Marlin W

Where the Action is: Let's make Reinforcement Learning for Stochastic Dynamic Vehicle Routing Problems work!

There has been a paradigm-shift in urban logistic services in the last years; demand for real-time, instant mobility and delivery services grows. This poses new challenges to logistic service providers as the underlying stochastic dynamic vehicle routing problems (SDVRPs) require anticipatory real-time routing actions. Searching the combinatorial action space for efficient routing actions is by itself a complex task of mixed-integer programming (MIP) well-known by the operations research community. This complexity is now multiplied by the challenge of evaluating such actions with respect to their effectiveness given future dynamism and uncertainty, a potentially ideal case for reinforcement learning (RL) well-known by the computer science community. For solving SDVRPs, joint work of both communities is needed, but as we show, essentially non-existing. Both communities focus on their individual strengths leaving potential for improvement. Our survey paper highlights this potential in research originating from both communities. We point out current obstacles in SDVRPs and guide towards joint approaches to overcome them.

action space, constraint, sdvrp, (17 more...)

2103.00507

Country: North America > United States > Iowa (0.04)

Genre:

Research Report (0.82)
Overview (0.68)

Industry:

Transportation > Freight & Logistics Services (1.00)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ensemble Bootstrapping for Q-Learning

Peer, Oren, Tessler, Chen, Merlis, Nadav, Meir, Ron

Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by utilizing two estimators, yet results in an under-estimation bias. Similar to over-estimation in Q-learning, in certain scenarios, the under-estimation bias may degrade performance. In this work, we introduce a new bias-reduced algorithm called Ensemble Bootstrapped Q-Learning (EBQL), a natural extension of Double-Q-learning to ensembles. We analyze our method both theoretically and empirically. Theoretically, we prove that EBQL-like updates yield lower MSE when estimating the maximal mean of a set of independent random variables. Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance. Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games.

ensemble bootstrapping, estimator, w-de, (11 more...)

2103.00445

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.93)
Leisure & Entertainment (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.42)

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach

Jiang, Haoming, Dai, Bo, Yang, Mengjiao, Zhao, Tuo, Wei, Wei

Reliable automatic evaluation of dialogue systems under an interactive environment has long been overdue. An ideal environment for evaluating dialog systems, also known as the Turing test, needs to involve human interaction, which is usually not affordable for large-scale experiments. Though researchers have attempted to use metrics (e.g., perplexity, BLEU) in language generation tasks or some model-based reinforcement learning methods (e.g., self-play evaluation) for automatic evaluation, these methods only show a very weak correlation with the actual human evaluation in practice. To bridge such a gap, we propose a new framework named ENIGMA for estimating human evaluation scores based on recent advances of off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation, making automatic evaluations feasible. More importantly, ENIGMA is model-free and agnostic to the behavior policies for collecting the experience data (see details in Section 2), which significantly alleviates the technical difficulties of modeling complex dialogue environments and human behaviors. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.

evaluation, experience data, true reward true reward, (11 more...)

2102.10242

Country:

North America > United States > Pennsylvania (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceFeb-27-2021, 21:57:50 GMT

Advanced AI: Deep Reinforcement Learning in Python

Free Coupon Discount - Advanced AI: Deep Reinforcement Learning in Python The Complete Guide to Mastering Artificial Intelligence using Deep Learning and Neural Networks | Created by Lazy Programmer Team, Lazy Programmer Inc. Students also bought Artificial Intelligence: Reinforcement Learning in Python Data Science: Natural Language Processing (NLP) in Python Unsupervised Machine Learning Hidden Markov Models in Python Cluster Analysis and Unsupervised Machine Learning in Python Complete Python Bootcamp: Go from zero to hero in Python 3 Preview this Udemy Course GET COUPON CODE 100% Off Udemy Coupon . Free Udemy Courses . Online Classes

learning, reinforcement, reinforcement learning, (9 more...)

Country: North America > United States > California (0.05)

Genre:

Instructional Material > Course Syllabus & Notes (0.93)
Instructional Material > Online (0.77)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.98)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.97)

#artificialintelligenceFeb-27-2021, 21:57:50 GMT

Cutting-Edge AI: Deep Reinforcement Learning in Python

Free Coupon Discount - Cutting-Edge AI: Deep Reinforcement Learning in Python, Apply deep learning to artificial intelligence and reinforcement learning using evolution strategies, A2C, and DDPG Highest Rated Created by Lazy Programmer Inc. Preview this Udemy Course GET COUPON CODE Description Welcome to Cutting-Edge AI! This is technically Deep Learning in Python part 11 of my deep learning series, and my 3rd reinforcement learning course. Deep Reinforcement Learning is actually the combination of 2 topics: Reinforcement Learning and Deep Learning (Neural Networks). While both of these have been around for quite some time, it's only been recently that Deep Learning has really taken off, and along with it, Reinforcement Learning. The maturation of deep learning has propelled advances in reinforcement learning, which has been around since the 1980s, although some aspects of it, such as the Bellman equation, have been for much longer.

deep reinforcement learning, learning, reinforcement learning, (8 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.72)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.39)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceFeb-27-2021, 11:45:14 GMT

Patenting Algorithm based Innovation: Best Practice Attorney

Common patent objection #methodpatentclaims executing steps as which are a set of a predefined sequence of steps used to implement an #algorithm, without disclosing any functional limitations pertaining to enablement of features as claimed in form of method steps. In a world where terms and conditions appear everywhere, we can't help but be suspicious about'catches' that exist within particular clauses or how we might set ourselves up for trouble by possibly agreeing to something. Thereby it is imperative to understand legal language as to how to draft patent application which will withstand the objections raised by patent examiner. Patent simply is a kind of IPRs and in AI one important subject where research has gained momentum is Deep reinforcement modules. Scientists globally are working on Deep reinforcement learning.

algorithm, module, multiple independent module, (10 more...)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

#artificialintelligenceFeb-27-2021, 10:05:43 GMT

All You Need to Know about Reinforcement Learning

Reinforcement learning (RL) is the area of machine learning that is concerned with how software is able to take the right decision.

reinforcement learning

Industry: Media > News (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

#artificialintelligenceFeb-27-2021, 01:25:25 GMT

An Introduction to Deep Reinforcement Learning and its Significance - Fingent Technology

RL algorithms can be used to solve tasks where automation is required. However actual implementation is easier said than done. You can ease your pain by using TF-Agents, a flexible library for TensorFlow to build reinforcement learning models. TF-Agents makes it easy to use reinforced learning for TensorFlow. TF-Agents enables newbies to learn RL using Colabs, documentation, and examples as well as researchers who want to build new RL algorithms.

deep reinforcement learning, fingent technology, significance, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningFeb-27-2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Xu, Tengyu, Yang, Zhuoran, Wang, Zhaoran, Liang, Yingbin

Designing off-policy reinforcement learning algorithms is typically a very challenging task, because a desirable iteration update often involves an expectation over an on-policy distribution. Prior off-policy actor-critic (AC) algorithms have introduced a new critic that uses the density ratio for adjusting the distribution mismatch in order to stabilize the convergence, but at the cost of potentially introducing high biases due to the estimation errors of both the density ratio and value function. In this paper, we develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP, which can take advantage of learned nuisance functions to reduce estimation errors. Moreover, DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize, and is thus more sample efficient than prior algorithms that adopt either two timescale or nested-loop structure. We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $\epsilon$-accurate optimal policy. We also show that the overall convergence of DR-Off-PAC is doubly robust to the approximation errors that depend only on the expressive power of approximation functions. To the best of our knowledge, our study establishes the first overall sample complexity analysis for a single time-scale off-policy AC algorithm.

algorithm, convergence, estimator, (11 more...)

arXiv.org Machine Learning

2102.11866

Country:

North America > Canada > Alberta (0.14)
North America > United States > Ohio (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)