AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Towards Playing Full MOBA Games with Deep Reinforcement Learning

Ye, Deheng, Chen, Guibin, Zhang, Wen, Chen, Sheng, Yuan, Bo, Liu, Bo, Chen, Jia, Liu, Zhao, Qiu, Fuhao, Yu, Hongsheng, Yin, Yinyuting, Shi, Bei, Wang, Liang, Shi, Tengfei, Fu, Qiang, Yang, Wei, Huang, Lanxiao, Liu, Wei

arXiv.org Artificial IntelligenceNov-25-2020

MOBA games, e.g., Honor of Kings, League of Legends, and Dota 2, pose grand challenges to AI systems such as multi-agent, enormous state-action space, complex action control, etc. Developing AI for playing MOBA games has raised much attention accordingly. However, existing work falls short in handling the raw game complexity caused by the explosion of agent combinations, i.e., lineups, when expanding the hero pool in case that OpenAI's Dota AI limits the play to a pool of only 17 heroes. As a result, full MOBA games without restrictions are far from being mastered by any existing AI system. In this paper, we propose a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning. Specifically, we develop a combination of novel and existing learning techniques, including curriculum self-play learning, policy distillation, off-policy adaption, multi-head value estimation, and Monte-Carlo tree-search, in training and playing a large pool of heroes, meanwhile addressing the scalability issue skillfully. Tested on Honor of Kings, a popular MOBA game, we show how to build superhuman AI agents that can defeat top esports players. The superiority of our AI is demonstrated by the first large-scale performance test of MOBA AI agent in the literature.

learning, moba game, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2011.12692

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology > Software (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning

Kotsalis, Georgios, Lan, Guanghui, Li, Tianjiao

arXiv.org Artificial IntelligenceNov-25-2020

The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A prominent application of our algorithmic developments is the stochastic policy evaluation problem in reinforcement learning. Prior investigations in the literature focused on temporal difference (TD) learning by employing nonsmooth finite time analysis motivated by stochastic subgradient descent leading to certain limitations. These encompass the requirement of analyzing a modified TD algorithm that involves projection to an a-priori defined Euclidean ball, achieving a non-optimal convergence rate and no clear way of deriving the beneficial effects of parallel implementation. Our approach remedies these shortcomings in the broader context of stochastic VIs and in particular when it comes to stochastic policy evaluation. We developed a variety of simple TD learning type algorithms motivated by its original version that maintain its simplicity, while offering distinct advantages from a non-asymptotic analysis point of view. We first provide an improved analysis of the standard TD algorithm that can benefit from parallel implementation. Then we present versions of a conditional TD algorithm (CTD), that involves periodic updates of the stochastic iterates, which reduce the bias and therefore exhibit improved iteration complexity. This brings us to the fast TD (FTD) algorithm which combines elements of CTD and the stochastic operator extrapolation method of the companion paper. For a novel index resetting policy FTD exhibits the best known convergence rate. We also devised a robust version of the algorithm that is particularly suitable for discounting factors close to 1.

algorithm, corollary 3, inequality, (15 more...)

arXiv.org Artificial Intelligence

2011.08434

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(3 more...)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Wu, Jingfeng, Braverman, Vladimir, Yang, Lin F.

arXiv.org Machine LearningNov-25-2020

In single-objective reinforcement learning (RL), a scalar reward is pre-specified and an agent learns a policy to maximize the long-term cumulative reward [Azar et al., 2017, Jin et al., 2018]. However, in many real-world applications, we need to optimize multiple objectives for the same (unknown) environment, even when these objectives are possibly contradicting [Roijers et al., 2013]. For example, in an autonomous driving application, each passenger may have a different preference of driving styles: some of the passengers prefer a very steady riding experience while other passengers enjoy the fast acceleration of the car. Therefore, traditional single-objective RL approach may fail to be applied in such scenarios. One way to tackle this issue is the multi-objective reinforcement learning (MORL) [Roijers et al., 2013, Yang et al., 2019, Natarajan and Tadepalli, 2005, Abels et al., 2018] method, which models the multiple objectives by a vectorized reward, and an additional preference vector to specify the relative importance of each objective. The agent of MORL needs to find policies to optimize the cumulative preference-weighted rewards under all possible preferences.

algorithm, probability, transition probability, (14 more...)

arXiv.org Machine Learning

2011.13034

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Passenger (0.65)
Transportation > Ground > Road (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Enhanced Scene Specificity with Sparse Dynamic Value Estimation

Singh, Jaskirat, Zheng, Liang

arXiv.org Machine LearningNov-25-2020

Multi-scene reinforcement learning involves training the RL agent across multiple scenes / levels from the same task, and has become essential for many generalization applications. However, the inclusion of multiple scenes leads to an increase in sample variance for policy gradient computations, often resulting in suboptimal performance with the direct application of traditional methods (e.g. PPO, A3C). One strategy for variance reduction is to consider each scene as a distinct Markov decision process (MDP) and learn a joint value function dependent on both state (s) and MDP (M). However, this is non-trivial as the agent is usually unaware of the underlying level at train / test times in multi-scene RL. Recently, Singh et al. [1] tried to address this by proposing a dynamic value estimation approach that models the true joint value function distribution as a Gaussian mixture model (GMM). In this paper, we argue that the error between the true scene-specific value function and the predicted dynamic estimate can be further reduced by progressively enforcing sparse cluster assignments once the agent has explored most of the state space. The resulting agents not only show significant improvements in the final reward score across a range of OpenAI ProcGen environments, but also exhibit increased navigation efficiency while completing a game level.

agent, arxiv preprint arxiv, reinforcement learning, (12 more...)

arXiv.org Machine Learning

2011.12574

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Advanced AI: Deep Reinforcement Learning in Python

#artificialintelligenceNov-24-2020, 22:15:51 GMT

This course is all about the application of deep learning and neural networks to reinforcement learning. If you've taken my first reinforcement learning class, then you know that reinforcement learning is on the bleeding edge of what we can do with AI. Specifically, the combination of deep learning with reinforcement learning has led to AlphaGo beating a world champion in the strategy game Go, it has led to self-driving cars, and it has led to machines that can play video games at a superhuman level. Reinforcement learning has been around since the 70s but none of this has been possible until now. The world is changing at a very fast pace.

deep reinforcement learning, neural network, reinforcement, (7 more...)

#artificialintelligence

Country: North America > United States > California (0.05)

Genre: Instructional Material > Course Syllabus & Notes (0.52)

Industry:

Leisure & Entertainment > Games (0.91)
Education > Educational Setting > Online (0.77)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Machine Learning A-Z : Hands-On Python & R In Data Science

#artificialintelligenceNov-24-2020, 05:53:18 GMT

Learn to create Machine Learning Algorithms in Python and R from two Data Science experts. Code templates included. BESTSELLER, 4.5 (96,237 ratings), Created by Kirill Eremenko, Hadelin de Ponteves, SuperDataScience Team, SuperDataScience Support, English [Auto-generated], French [Auto-generated], 7 more Machine Learning A-Z™: Hands-On Python & R In Data Science Master Machine Learning on Python & R Have a great intuition of many Machine Learning models Make accurate predictions Make powerful analysis Make robust Machine Learning models Create strong added value to your business Use Machine Learning for personal purpose Handle specific topics like Reinforcement Learning, NLP and Deep Learning Handle advanced techniques like Dimensionality Reduction Know which Machine Learning model to choose for each type of problem Build an army of powerful Machine Learning models and know how to combine them to solve any problem PREVIEW THIS UDEMY COURSE -.> GET COUPON CODE

hand-on python, learning, machine learning model, (12 more...)

#artificialintelligence

Genre:

Instructional Material > Online (0.41)
Instructional Material > Course Syllabus & Notes (0.41)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.44)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

World Model as a Graph: Learning Latent Landmarks for Planning

Zhang, Lunjun, Yang, Ge, Stadie, Bradly C.

arXiv.org Artificial IntelligenceNov-24-2020

Planning - the ability to analyze the structure of a problem in the large and decompose it into interrelated subproblems - is a hallmark of human intelligence. While deep reinforcement learning (RL) has shown great promise for solving relatively straightforward control tasks, it remains an open problem how to best incorporate planning into existing deep RL paradigms to handle increasingly complex environments. One prominent framework, Model-Based RL, learns a world model and plans using step-by-step virtual rollouts. This type of world model quickly diverges from reality when the planning horizon increases, thus struggling at long-horizon planning. How can we learn world models that endow agents with the ability to do temporally extended reasoning? In this work, we propose to learn graph-structured world models composed of sparse, multi-step transitions. We devise a novel algorithm to learn latent landmarks that are scattered (in terms of reachability) across the goal space as the nodes on the graph. In this same graph, the edges are the reachability estimates distilled from Q-functions. On a variety of high-dimensional continuous control tasks ranging from robotic manipulation to navigation, we demonstrate that our method, named L3P, significantly outperforms prior work, and is oftentimes the only method capable of leveraging both the robustness of model-free RL and generalization of graph-search algorithms. We believe our work is an important step towards scalable planning in reinforcement learning.

algorithm, arxiv preprint arxiv, landmark, (12 more...)

arXiv.org Artificial Intelligence

2011.12491

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > France (0.04)

Genre: Research Report (0.41)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Self-improving Chatbots based on Deep Reinforcement Learning

#artificialintelligenceNov-23-2020, 19:50:44 GMT

We present a Reinforcement Learning (RL) model for self-improving chatbots, specifically targeting FAQ-type chatbots. The model is not aimed at building a dialog system from scratch, but to leverage data from user conversations to improve chatbot performance. At the core of our approach is a score model, which is trained to score chatbot utterance-response tuples based on user feedback. The scores predicted by this model are used as rewards for the RL agent. Policy learning takes place offline, thanks to an user simulator which is fed with utterances from the FAQ-database.

chatbot, score model, utterance-response tuple, (12 more...)

#artificialintelligence

Country: North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

REPAINT: Knowledge Transfer in Deep Actor-Critic Reinforcement Learning

Tao, Yunzhe, Genc, Sahika, Sun, Tao, Mallya, Sunil

arXiv.org Artificial IntelligenceNov-23-2020

Accelerating the learning processes for complex tasks by leveraging previously learned tasks has been one of the most challenging problems in reinforcement learning, especially when the similarity between source and target tasks is low or unknown. In this work, we propose a REPresentation-And-INstance Transfer algorithm (REPAINT) for deep actor-critic reinforcement learning paradigm. In representation transfer, we adopt a kickstarted training method using a pre-trained teacher policy by introducing an auxiliary cross-entropy loss. In instance transfer, we develop a sampling approach, i.e., advantage-based experience replay, on transitions collected following the teacher policy, where only the samples with high advantage estimates are retained for policy update. We consider both learning an unseen target task by transferring from previously learned teacher tasks and learning a partially unseen task composed of multiple sub-tasks by transferring from a pre-learned teacher sub-task. In several benchmark experiments, REPAINT significantly reduces the total training time and improves the asymptotic performance compared to training with no prior knowledge and other baselines.

algorithm, iteration, teacher policy, (14 more...)

arXiv.org Artificial Intelligence

2011.11827

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

An analysis of Reinforcement Learning applied to Coach task in IEEE Very Small Size Soccer

Pena, Carlos H. C., Machado, Mateus G., Barros, Mariana S., Silva, José D. P., Maciel, Lucas D., Ren, Tsang Ing, Barros, Edna N. S., Braga, Pedro H. M., Bassani, Hansenclever F.

arXiv.org Artificial IntelligenceNov-23-2020

The IEEE Very Small Size Soccer (VSSS) is a robot soccer competition in which two teams of three small robots play against each other. Traditionally, a deterministic coach agent will choose the most suitable strategy and formation for each adversary's strategy. Therefore, the role of a coach is of great importance to the game. In this sense, this paper proposes an end-to-end approach for the coaching task based on Reinforcement Learning (RL). The proposed system processes the information during the simulated matches to learn an optimal policy that chooses the current formation, depending on the opponent and game conditions. We trained two RL policies against three different teams (balanced, offensive, and heavily offensive) in a simulated environment. Our results were assessed against one of the top teams of the VSSS league, showing promising results after achieving a win/loss ratio of approximately 2.0.

agent, attacker, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2011.11785

Country:

Europe > Portugal > Braga > Braga (0.04)
South America > Brazil > Pernambuco > Recife (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback