AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Efficient Reinforcement Learning with a Mind-Game for Full-Length StarCraft II

Liu, Ruo-Ze, Guo, Haifeng, Ji, Xiaozhong, Yu, Yang, Xiao, Zitai, Wu, Yuzhou, Pang, Zhen-Jia, Lu, Tong

arXiv.org Artificial IntelligenceMar-2-2019

StarCraft II provides an extremely challenging platform for reinforcement learning due to its huge state-space and game length. The previous fastest method requires days to train a full-length game policy in a single commercial machine. In this paper, we introduce the mind-game to facilitate the reinforcement learning, which is an abstract task model. With the mind-game, the policy is firstly trained in the mind-game fastly and is then mapped to the real game for the second phase training. In our experiments, the trained agent can achieve a 100% win-rate on the map Simple64 against the most difficult non-cheating built-in bot (level-7), and the training is 100 times faster than the previous ones under the same computational resource. To test the generalization performance of the agent, a Golden level of StarCraft II Ladder human player has competed with the agent. With restricted strategy, the agent wins the human player by 4 out of 5 games. The mind-game approach might shed some light for further studies of efficient reinforcement learning. The codes are publicly available (https://github.com/mindgameSC2/mind-SC2).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1903.00715

Country:

Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Cooperative Multi-Agent Reinforcement Learning Framework for Resource Balancing in Complex Logistics Network

Li, Xihan, Zhang, Jia, Bian, Jiang, Tong, Yunhai, Liu, Tie-Yan

arXiv.org Artificial IntelligenceMar-2-2019

Resource balancing within complex transportation networks is one of the most important problems in real logistics domain. Traditional solutions on these problems leverage combinatorial optimization with demand and supply forecasting. However, the high complexity of transportation routes, severe uncertainty of future demand and supply, together with non-convex business constraints make it extremely challenging in the traditional resource management field. In this paper, we propose a novel sophisticated multi-agent reinforcement learning approach to address these challenges. In particular, inspired by the externalities especially the interactions among resource agents, we introduce an innovative cooperative mechanism for state and reward design resulting in more effective and efficient transportation. Extensive experiments on a simulated ocean transportation service demonstrate that our new approach can stimulate cooperation among agents and lead to much better performance. Compared with traditional solutions based on combinatorial optimization, our approach can give rise to a significant improvement in terms of both performance and stability.

container, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1903.00714

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Singapore (0.05)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
(14 more...)

Genre: Research Report (0.64)

Industry: Transportation > Freight & Logistics Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.64)

Add feedback

Automating Predictive Modeling Process using Reinforcement Learning

Khurana, Udayan, Samulowitz, Horst

arXiv.org Artificial IntelligenceMar-2-2019

Building a good predictive model requires an array of activities such as data imputation, feature transformations, estimator selection, hyper-parameter search and ensemble construction. Given the large, complex and heterogenous space of options, off-the-shelf optimization methods are infeasible for realistic response times. In practice, much of the predictive modeling process is conducted by experienced data scientists, who selectively make use of available tools. Over time, they develop an understanding of the behavior of operators, and perform serial decision making under uncertainty, colloquially referred to as educated guesswork. With an unprecedented demand for application of supervised machine learning, there is a call for solutions that automatically search for a good combination of parameters across these tasks to minimize the modeling error. We introduce a novel system called APRL (Autonomous Predictive modeler via Reinforcement Learning), that uses past experience through reinforcement learning to optimize such sequential decision making from within a set of diverse actions under a time constraint on a previously unseen predictive learning problem. APRL actions are taken to optimize the performance of a final ensemble. This is in contrast to other systems, which maximize individual model accuracy first and create ensembles as a disconnected post-processing step. As a result, APRL is able to reduce up to 71\% of classification error on average over a wide variety of problems.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1903.00743

Genre: Research Report (0.40)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Asynchronous Episodic Deep Deterministic Policy Gradient: Towards Continuous Control in Computationally Complex Environments

Zhang, Zhizheng, Chen, Jiale, Chen, Zhibo, Li, Weiping

arXiv.org Machine LearningMar-2-2019

Deep Deterministic Policy Gradient (DDPG) has been proved to be a successful reinforcement learning (RL) algorithm for continuous control tasks. However, DDPG still suffers from data insufficiency and training inefficiency, especially in computationally complex environments. In this paper, we propose Asynchronous Episodic DDPG (AE-DDPG), as an expansion of DDPG, which can achieve more effective learning with less training time required. First, we design a modified scheme for data collection in an asynchronous fashion. Generally, for asynchronous RL algorithms, sample efficiency or/and training stability diminish as the degree of parallelism increases. We consider this problem from the perspectives of both data generation and data utilization. In detail, we re-design experience replay by introducing the idea of episodic control so that the agent can latch on good trajectories rapidly. In addition, we also inject a new type of noise in action space to enrich the exploration behaviors. Experiments demonstrate that our AE-DDPG achieves higher rewards and requires less time consuming than most popular RL algorithms in Learning to Run task which has a computationally complex environment. Not limited to the control tasks in computationally complex environments, AE-DDPG also achieves higher rewards and 2- to 4-fold improvement in sample efficiency on average compared to other variants of DDPG in MuJoCo environments. Furthermore, we verify the effectiveness of each proposed technique component through abundant ablation study.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1903.00827

Country: Asia (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Unified Framework for Regularized Reinforcement Learning

Li, Xiang, Yang, Wenhao, Zhang, Zhihua

arXiv.org Machine LearningMar-2-2019

We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term. The extant entropy-regularized MDPs can be cast into our framework. Moreover, under our framework there are many regularization terms can bring multi-modality and sparsity which are potentially useful in reinforcement learning. In particular, we present sufficient and necessary conditions that induce a sparse optimal policy. We also conduct a full mathematical analysis of the proposed regularized MDPs, including the optimality condition, performance error and sparseness control. We provide a generic method to devise regularization forms and propose off-policy actor-critic algorithms in complex environment settings. We empirically analyze the statistical properties of optimal policies and compare the performance of different sparse regularization forms in discrete and continuous environments.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1903.00725

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

After Mastering Go and StarCraft, DeepMind Takes on Soccer

#artificialintelligenceMar-1-2019, 13:55:04 GMT

Having notched impressive victories over human professionals in Go, Atari Games, and most recently StarCraft 2 -- Google's DeepMind team has now turned its formidable research efforts to soccer. In a paper released last week, the UK AI company demonstrates a novel machine learning method that trains a team of AI agents to play a simulated version of "the beautiful game." Gaming, AI and soccer fans hailed DeepMind's latest innovation on social media, with comments like "You should partner with EA Sports for a FIFA environment!" Machine learning, and particularly deep reinforcement learning, has in recent years achieved remarkable success across a wide range of competitive games. Collaborative-multi-agent games however remained a relatively difficult research domain.

deepmind take, large language model, machine learning, (9 more...)

#artificialintelligence

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback

Reinforcement Learning with Pytorch

#artificialintelligenceMar-1-2019, 08:36:33 GMT

Learn to apply Reinforcement Learning and Artificial Intelligence algorithms using Python, Pytorch and OpenAI Gym Artificial Intelligence is dynamically edging its way into our lives. It is already broadly available and we use it - sometimes even not knowing it - on daily basis. Soon it will be our permanent, every day companion. And where can we place Reinforcement Learning in AI world? Definitely this is one of the most promising and fastest growing technologies that can eventually lead us to General Artificial Intelligence!

artificial intelligence, machine learning, reinforcement learning, (7 more...)

#artificialintelligence

Genre: Instructional Material (0.43)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.39)

Add feedback

Learning To Follow Directions in Street View

Hermann, Karl Moritz, Malinowski, Mateusz, Mirowski, Piotr, Banki-Horvath, Andras, Anderson, Keith, Hadsell, Raia

arXiv.org Artificial IntelligenceMar-1-2019

Navigating and understanding the real world remains a key challenge in machine learning and inspires a great variety of research in areas such as language grounding, planning, navigation and computer vision. We propose an instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data. StreetNav is built on top of Google Street View and provides visually accurate environments representing real places. Agents are given driving instructions which they must learn to interpret in order to successfully navigate in this environment. Since humans equipped with driving instructions can readily navigate in previously unseen cities, we set a high bar and test our trained agents for similar cognitive capabilities. Although deep reinforcement learning (RL) methods are frequently evaluated only on data that closely follow the training distribution, our dataset extends to multiple cities and has a clean train/test separation. This allows for thorough testing of generalisation ability. This paper presents the StreetNav environment and tasks, a set of novel models that establish strong baselines, and analysis of the task and the trained agents.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1903.00401

Country:

North America > United States > New York (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany > Berlin (0.04)

Genre:

Research Report (0.84)
Workflow (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Safe Reinforcement Learning with Model Uncertainty Estimates

Lütjens, Björn, Everett, Michael, How, Jonathan P.

arXiv.org Artificial IntelligenceMar-1-2019

Many current autonomous systems are being designed with a strong reliance on black box predictions from deep neural networks (DNNs). However, DNNs tend to be overconfident in predictions on unseen data and can give unpredictable results for far-from-distribution test data. The importance of predictions that are robust to this distributional shift is evident for safety-critical applications, such as collision avoidance around pedestrians. Measures of model uncertainty can be used to identify unseen data, but the state-of-the-art extraction methods such as Bayesian neural networks are mostly intractable to compute. This paper uses MC-Dropout and Bootstrapping to give computationally tractable and parallelizable uncertainty estimates. The methods are embedded in a Safe Reinforcement Learning framework to form uncertainty-aware navigation around pedestrians. The result is a collision avoidance policy that knows what it does not know and cautiously avoids pedestrians that exhibit unseen behavior. The policy is demonstrated in simulation to be more robust to novel observations and take safer actions than an uncertainty-unaware baseline.

machine learning, obstacle, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1810.087

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Transportation (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Should All Temporal Difference Learning Use Emphasis?

Gu, Xiang, Ghiassian, Sina, Sutton, Richard S.

arXiv.org Artificial IntelligenceMar-1-2019

Emphatic Temporal Difference (ETD) learning has recently been proposed as a convergent off-policy learning method. ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy training but it is different from conventional TD learning even under on-policy training. A simple counterexample provided back in 2017 pointed to a potential class of problems where ETD converges but TD diverges. In this paper, we empirically show that ETD converges on a few other well-known on-policy experiments whereas TD either diverges or performs poorly. We also show that ETD outperforms TD on the mountain car prediction problem. Our results, together with a similar pattern observed under off-policy training in prior works, suggest that ETD might be a good substitute over conventional TD.

emphatic td, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1903.00194

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback