AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Toward the Fundamental Limits of Imitation Learning

Rajaraman, Nived, Yang, Lin F., Jiao, Jiantao, Ramachandran, Kannan

arXiv.org Artificial IntelligenceSep-13-2020

Imitation learning (IL) aims to mimic the behavior of an expert policy in a sequential decision-making problem given only demonstrations. In this paper, we focus on understanding the minimax statistical limits of IL in episodic Markov Decision Processes (MDPs). We first consider the setting where the learner is provided a dataset of $N$ expert trajectories ahead of time, and cannot interact with the MDP. Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy. Here $\mathcal{S}$ is the state space, and $H$ is the length of the episode. Furthermore, we establish a suboptimality lower bound of $\gtrsim |\mathcal{S}| H^2 / N$ which applies even if the expert is constrained to be deterministic, or if the learner is allowed to actively query the expert at visited states while interacting with the MDP for $N$ episodes. To our knowledge, this is the first algorithm with suboptimality having no dependence on the number of actions, under no additional assumptions. We then propose a novel algorithm based on minimum-distance functionals in the setting where the transition model is given and the expert is deterministic. The algorithm is suboptimal by $\lesssim \min \{ H \sqrt{|\mathcal{S}| / N} ,\ |\mathcal{S}| H^{3/2} / N \}$, showing that knowledge of transition improves the minimax rate by at least a $\sqrt{H}$ factor.

learner, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2009.0599

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)

Add feedback

Oracle-Efficient Reinforcement Learning in Factored MDPs with Unknown Structure

Rosenberg, Aviv, Mansour, Yishay

arXiv.org Machine LearningSep-13-2020

We consider provably-efficient reinforcement learning (RL) in non-episodic factored Markov decision processes (FMDPs). All previous algorithms for regret minimization in this setting made the strong assumption that the factored structure of the FMDP is known to the learner in advance. In this paper, we provide the first provably-efficient algorithm that has to learn the structure of the FMDP while minimizing its regret. Our algorithm is based on the optimism in face of uncertainty principle, combined with a simple statistical method for structure learning, and can be implemented efficiently given oracle-access to an FMDP planner. It maintains its computational efficiency even though the number of possible structures is exponential.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2009.05986

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.55)

Add feedback

Efficient Competitive Self-Play Policy Optimization

Zhong, Yuanyi, Zhou, Yuan, Peng, Jian

arXiv.org Artificial IntelligenceSep-13-2020

Reinforcement learning from self-play has recently reported many successes. Self-play, where the agents compete with themselves, is often used to generate training data for iterative policy improvement. In previous work, heuristic rules are designed to choose an opponent for the current learner. Typical rules include choosing the latest agent, the best agent, or a random historical agent. However, these rules may be inefficient in practice and sometimes do not guarantee convergence even in the simplest matrix games. In this paper, we propose a new algorithmic framework for competitive self-play reinforcement learning in two-player zero-sum games. We recognize the fact that the Nash equilibrium coincides with the saddle point of the stochastic payoff function, which motivates us to borrow ideas from classical saddle point optimization literature. Our method trains several agents simultaneously, and intelligently takes each other as opponent based on simple adversarial rules derived from a principled perturbation-based saddle optimization method. We prove theoretically that our algorithm converges to an approximate equilibrium with high probability in convex-concave games under standard assumptions. Beyond the theory, we further show the empirical superiority of our method over baseline methods relying on the aforementioned opponent-selection heuristics in matrix games, grid-world soccer, Gomoku, and simulated robot sumo, with neural net policy function approximators.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2009.06086

Country: North America > United States > Illinois (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (1.00)
Leisure & Entertainment > Sports > Soccer (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Artificial Intelligence System Able to Move Individual Molecules

#artificialintelligenceSep-12-2020, 14:05:17 GMT

A team of researchers at Electronic Arts have recently experimented with various artificial intelligence algorithms, including reinforcement learning models, to automate aspects of video game creation. The researchers hope that the AI models can save their developers and animators time doing repetitive tasks like coding character movement. Designing a video game, particularly the large, triple-A video games designed by large game companies, requires thousands of hours of work. As video game consoles, computers, and mobile devices become more powerful, video games themselves become more and more complex. Game developers are searching for ways to produce more game content with less effort, for example, they often choose to use procedural generation algorithms to produce landscapes and environments.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Country:

North America > Canada > Ontario > Toronto (0.17)
North America > Canada > Alberta (0.17)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse reinforcement learning

Rosbach, Sascha, Li, Xing, Großjohann, Simon, Homoceanu, Silviu, Roth, Stefan

arXiv.org Artificial IntelligenceSep-12-2020

General-purpose trajectory planning algorithms for automated driving utilize complex reward functions to perform a combined optimization of strategic, behavioral, and kinematic features. The specification and tuning of a single reward function is a tedious task and does not generalize over a large set of traffic situations. Deep learning approaches based on path integral inverse reinforcement learning have been successfully applied to predict local situation-dependent reward functions using features of a set of sampled driving policies. Sample-based trajectory planning algorithms are able to approximate a spatio-temporal subspace of feasible driving policies that can be used to encode the context of a situation. However, the interaction with dynamic objects requires an extended planning horizon, which depends on sequential context modeling. In this work, we are concerned with the sequential reward prediction over an extended time horizon. We present a neural network architecture that uses a policy attention mechanism to generate a low-dimensional context vector by concentrating on trajectories with a human-like driving style. Apart from this, we propose a temporal attention mechanism to identify context switches and allow for stable adaptation of rewards. We evaluate our results on complex simulated driving situations, including other moving vehicles. Our evaluation shows that our policy attention mechanism learns to focus on collision-free policies in the configuration space. Furthermore, the temporal attention mechanism learns persistent interaction with other vehicles over an extended planning horizon.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2007.05798

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(5 more...)

Genre: Research Report (0.70)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Extended Radial Basis Function Controller for Reinforcement Learning

Capel, Nicholas, Zhang, Naifu

arXiv.org Machine LearningSep-12-2020

There have been attempts in model-based reinforcement learning to exploit a priori knowledge about the structure of the system. This paper introduces the extended radial basis function (RBF) controller design. In addition to traditional RBF controllers, our controller comprises of an engineered linear controller inside an operating region. We show that the learnt extended RBF controller takes on the desirable characteristics of both the linear and non-linear controller models. The extended controller is shown to retain the ability for universal function approximation of the non-linear RBF functions. At the same time, it demonstrates desirable stability criteria on par with the linear controller. Learning has been done in a probabilistic inference framework (PILCO), but could generalise to other reinforcement learning frameworks. Experimental results from the Swing-up pendulum, Cartpole, and Mountain car environments are reported.

controller, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2009.05866

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > California > Los Angeles County > Santa Monica (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Multi-Objective Reinforcement Learning for Infectious Disease Control with Application to COVID-19 Spread

Wan, Runzhe, Zhang, Xinyu, Song, Rui

arXiv.org Machine LearningSep-12-2020

Severe infectious diseases such as the novel coronavirus (COVID-19) pose a huge threat to public health. Stringent control measures, such as school closures and stay-at-home orders, while having significant effects, also bring huge economic losses. A crucial question for policymakers around the world is how to make the trade-off and implement the appropriate interventions. In this work, we propose a Multi-Objective Reinforcement Learning framework to facilitate the data-driven decision making and minimize the long-term overall cost. Specifically, at each decision point, a Bayesian epidemiological model is first learned as the environment model, and then we use the proposed model-based multi-objective planning algorithm to find a set of Pareto-optimal policies. This framework, combined with the prediction bands for each policy, provides a real-time decision support tool for policymakers. The application is demonstrated with the spread of COVID-19 in China.

intervention, machine learning, real time system, (17 more...)

arXiv.org Machine Learning

2009.04607

Country:

Asia > China (0.25)
Oceania > New Zealand (0.04)
North America > United States > North Carolina (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

Semantic-preserving Reinforcement Learning Attack Against Graph Neural Networks for Malware Detection

Zhang, Lan, Liu, Peng, Choi, Yoon-Ho

arXiv.org Artificial IntelligenceSep-11-2020

To address the costs of reverse engineering and signature extraction, advanced research on malware detection focuses on using neural networks to learn malicious behaviors with static and dynamic features. However, deep learning-based malware detection models are vulnerable to a hack from adversarial samples. The attackers' goal is to generate imperceptible perturbations to the original samples and evade detection. In the context of malware, the generated samples should have one more important character: it should not change the malicious behaviors of the original code. So the original features can not be removed and changed. In this paper, we proposed a reinforcement learning based attack to deceive graph based malware detection models. Inspired by obfuscation techniques, the central idea of the proposed attack is to sequentially inject semantic Nops, which will not change the program's functionality, into CFGs(Control Flow Graph). Specifically, the Semantics-preserving Reinforcement Learning(SRL) Attack is to learn an RL agent to iteratively select the semantic Nops and insert them into basic blocks of the CFGs. Variants of obfuscation methods, hill-climbing methods, and gradient based algorithms are proposed: 1) Semantics-preserving Random Insertion(SRI) Attack: randomly inserting semantic Nops into basic blocks.; 2) Semantics-preserving Accumulated Insertion(SAI) Attack: declining certain random transformation according to the probability of the target class; 3) Semantics-preserving Gradient based Insertion(SGI) Attack: applying transformation on the original CFG in the direction of the gradient. We use real-world Windows programs to show that a family of Graph Neural Network models are vulnerable to these attacks. The best evasion rate of the benchmark attacks are 97% on the basic GCN model and 96% on DGCNN model. The SRL attack can achieve 100% on both models.

detection model, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2009.05602

Country:

North America > United States > Wisconsin (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Embodied Visual Navigation with Automatic Curriculum Learning in Real Environments

Morad, Steven D., Mecca, Roberto, Poudel, Rudra P. K., Liwicki, Stephan, Cipolla, Roberto

arXiv.org Artificial IntelligenceSep-11-2020

We present NavACL, a method of automatic curriculum learning tailored to the navigation task. NavACL is simple to train and efficiently selects relevant tasks using geometric features. In our experiments, deep reinforcement learning agents trained using NavACL in collision-free environments significantly outperform state-of-the-art agents trained with uniform sampling -- the current standard. Furthermore, our agents are able to navigate through unknown cluttered indoor environments to semantically-specified targets using only RGB images. Collision avoidance policies and frozen feature networks support transfer to unseen real-world environments, without any modification or retraining requirements. We evaluate our policies in simulation, and in the real world on a ground robot and a quadrotor drone. Videos of real-world results are available in the supplementary material

machine learning, navigation, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2009.05429

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Industry:

Leisure & Entertainment (0.46)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Mobile Robot Path Planning in Dynamic Environments through Globally Guided Reinforcement Learning

Wang, Binyu, Liu, Zhe, Li, Qingbiao, Prorok, Amanda

arXiv.org Artificial IntelligenceSep-11-2020

Path planning for mobile robots in large dynamic environments is a challenging problem, as the robots are required to efficiently reach their given goals while simultaneously avoiding potential conflicts with other robots or dynamic objects. In the presence of dynamic obstacles, traditional solutions usually employ re-planning strategies, which re-call a planning algorithm to search for an alternative path whenever the robot encounters a conflict. However, such re-planning strategies often cause unnecessary detours. To address this issue, we propose a learning-based technique that exploits environmental spatio-temporal information. Different from existing learning-based methods, we introduce a globally guided reinforcement learning approach (G2RL), which incorporates a novel reward structure that generalizes to arbitrary environments. We apply G2RL to solve the multi-robot path planning problem in a fully distributed reactive manner. We evaluate our method across different map types, obstacle densities, and the number of robots. Experimental results show that G2RL generalizes well, outperforming existing distributed methods, and performing very similarly to fully centralized state-of-the-art benchmarks.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2005.0542

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback