AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Safe Reinforcement Learning via Online Shielding

Bastani, Osbert

arXiv.org Machine LearningMay-25-2019

Reinforcement learning is a promising approach to learning control policies for complex robotics tasks. A key challenge is ensuring safety of the learned control policy---e.g., that a walking robot does not fall over, or a quadcopter does not run into a wall. We focus on the setting where the dynamics are known, and the goal is to prove that a policy learned in simulation satisfies a given safety constraint. Existing approaches for ensuring safety suffer from a number of limitations---e.g., they do not scale to high-dimensional state spaces, or they only ensure safety for a fixed environment. We propose an approach based on shielding, which uses a backup controller to override the learned controller as necessary to ensure that safety holds. Rather than compute when to use the backup controller ahead-of-time, we perform this computation online. By doing so, we ensure that our approach is computationally efficient, and furthermore, can be used to ensure safety even in novel environments. We empirically demonstrate that our approach can ensure safety in experiments on cart-pole and on a bicycle with random obstacles.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1905.10691

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > California (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)

Add feedback

Reachable Space Characterization of Markov Decision Processes with Time Variability

Xu, Junhong, Yin, Kai, Liu, Lantao

arXiv.org Artificial IntelligenceMay-25-2019

We propose a solution to a time-varying variant of Markov Decision Processes which can be used to address decision-theoretic planning problems for autonomous systems operating in unstructured outdoor environments. We explore the time variability property of the planning stochasticity and investigate the state reachability, based on which we then develop an efficient iterative method that offers a good trade-off between solution optimality and time complexity. The reachability space is constructed by analyzing the means and variances of states' reaching time in the future. We validate our algorithm through extensive simulations using ocean data, and the results show that our method achieves a great performance in terms of both solution quality and computing time.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1905.09342

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Indiana > Monroe County > Bloomington (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

Add feedback

Large Scale Markov Decision Processes with Changing Rewards

Cardoso, Adrian Rivera, Wang, He, Xu, Huan

arXiv.org Machine LearningMay-25-2019

We consider Markov Decision Processes (MDPs) where the rewards are unknown and may change in an adversarial manner. We provide an algorithm that achieves state-of-the-art regret bound of $O( \sqrt{\tau (\ln|S|+\ln|A|)T}\ln(T))$, where $S$ is the state space, $A$ is the action space, $\tau$ is the mixing time of the MDP, and $T$ is the number of periods. The algorithm's computational complexity is polynomial in $|S|$ and $|A|$ per period. We then consider a setting often encountered in practice, where the state space of the MDP is too large to allow for exact solutions. By approximating the state-action occupancy measures with a linear architecture of dimension $d\ll|S|$, we propose a modified algorithm with computational complexity polynomial in $d$. We also prove a regret bound for this modified algorithm, which to the best of our knowledge this is the first $\tilde{O}(\sqrt{T})$ regret bound for large scale MDPs with changing rewards.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1905.10649

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Prioritized Sequence Experience Replay

Brittain, Marc, Bertram, Josh, Yang, Xuxi, Wei, Peng

arXiv.org Artificial IntelligenceMay-25-2019

Experience replay is widely used in deep reinforcement learning algorithms and allows agents to remember and learn from experiences from the past. In an effort to learn more efficiently, researchers proposed prioritized experience replay (PER) which samples important transitions more frequently. In this paper, we propose Prioritized Sequence Experience Replay (PSER) a framework for prioritizing sequences of experience in an attempt to both learn more efficiently and to obtain better performance. We compare performance of uniform, PER and PSER sampling techniques in DQN on the Atari 2600 benchmark and show DQN with PSER substantially outperforms PER and uniform sampling.

machine learning, reinforcement learning, transition, (13 more...)

arXiv.org Artificial Intelligence

1905.12726

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Composing Ensembles of Policies with Deep Reinforcement Learning

Qureshi, Ahmed H., Johnson, Jacob J., Qin, Yuzhe, Boots, Byron, Yip, Michael C.

arXiv.org Artificial IntelligenceMay-25-2019

Composition of elementary skills into complex behaviors to solve challenging problems is one of the key elements toward building intelligent machines. To date, there has been plenty of work on learning new policies or skills but almost no focus on composing them to perform complex decision-making. In this paper, we propose a policy ensemble composition framework that takes the robot's primitive policies and learns to compose them concurrently or sequentially through reinforcement learning. We evaluate our method in problems where traditional approaches either fail or exhibit high sample complexity to find a solution. We show that our method not only solves the problems that require both task and motion planning but also exhibits high data efficiency, which is currently one of the main limitations of reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1905.10681

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Adversarial Policies: Attacking Deep Reinforcement Learning

Gleave, Adam, Dennis, Michael, Kant, Neel, Wild, Cody, Levine, Sergey, Russell, Stuart

arXiv.org Artificial IntelligenceMay-25-2019

Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. However, an attacker is not usually able to directly modify another agent's observations. This might lead one to wonder: is it possible to attack an RL agent simply by choosing an adversarial policy acting in a multi-agent environment so as to create natural observations that are adversarial? We demonstrate the existence of adversarial policies in zero-sum games between simulated humanoid robots with proprioceptive observations, against state-of-the-art victims trained via self-play to be robust to opponents. The adversarial policies reliably win against the victims but generate seemingly random and uncoordinated behavior. We find that these policies are more successful in high-dimensional environments, and induce substantially different activations in the victim policy network than when the victim plays against a normal opponent.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

1905.10615

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Sports (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ASPIRE: Automated Security Policy Implementation Using Reinforcement Learning

Birman, Yoni, Hindi, Shaked, Katz, Gilad, Shabtai, Asaf

arXiv.org Artificial IntelligenceMay-25-2019

Malware detection is an ever-present challenge for all organizational gatekeepers. Organizations often deploy numerous different malware detection tools, and then combine their output to produce a final classification for an inspected file. This approach has two significant drawbacks. First, it requires large amounts of computing resources and time since every incoming file needs to be analyzed by all detectors. Secondly, it is difficult to accurately and dynamically enforce a predefined security policy that comports with the needs of each organization (e.g., how tolerant is the organization to false negatives and false positives). In this study we propose ASPIRE, a reinforcement learning (RL)-based method for malware detection. Our approach receives the organizational policy -- defined solely by the perceived costs of correct/incorrect classifications and of computing resources -- and then dynamically assigns detection tools and sets the detection threshold for each inspected file. We demonstrate the effectiveness and robustness of our approach by conducting an extensive evaluation on multiple organizational policies. ASPIRE performed well in all scenarios, even achieving near-optimal accuracy of 96.21% (compared to an optimum of 96.86%) at approximately 20% of the running time of this baseline.

detector, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1905.10517

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.87)

Add feedback

InfoRL: Interpretable Reinforcement Learning using Information Maximization

Hayat, Aadil, Singh, Utsav, Namboodiri, Vinay P.

arXiv.org Artificial IntelligenceMay-24-2019

Recent advances in reinforcement learning have proved that given an environment we can learn to perform a task in that environment if we have access to some form of a reward function (dense, sparse or derived from IRL). But most of the algorithms focus on learning a single best policy to perform a given set of tasks. In this paper, we focus on an algorithm that learns to not just perform a task but different ways to perform the same task. As we know when the environment is complex enough there always exists multiple ways to perform a task. We show that using the concept of information maximization it is possible to learn latent codes for discovering multiple ways to perform any given task in an environment.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1905.10404

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Symmetric Reward Noising for Reinforcement Learning

Vivanti, Refael, Sohlberg-Baris, Talya D., Cohen, Shlomo, Cohen, Orna

arXiv.org Artificial IntelligenceMay-24-2019

Recent reinforcement learning algorithms, though achieving impressive results in various fields, suffer from brittle training effects such as regression in results and high sensitivity to initialization and parameters. We claim that some of the brittleness stems from variance differences, i.e. when different environment areas - states and/or actions - have different rewards variance. This causes two problems: First, the "Boring Areas Trap" in algorithms such as Q-learning, where moving between areas depends on the current area variance, and getting out of a boring area is hard due to its low variance. Second, the "Manipulative Consultant" problem, when value-estimation functions used in DQN and Actor-Critic algorithms influence the agent to prefer boring areas, regardless of the mean rewards return, as they maximize estimation precision rather than rewards. This sheds a new light on how exploration contribute to training, as it helps with both challenges. Cognitive experiments in humans showed that noised reward signals may paradoxically improve performance. We explain this using the two mentioned problems, claiming that both humans and algorithms may share similar challenges. Inspired by this result, we propose the Adaptive Symmetric Reward Noising (ASRN), by which we mean adding Gaussian noise to rewards according to their states' estimated variance, thus avoiding the two problems while not affecting the environment's mean rewards behavior. We conduct our experiments in a Multi Armed Bandit problem with variance differences. We demonstrate that a Q-learning algorithm shows the brittleness effect in this problem, and that the ASRN scheme can dramatically improve the results. We show that ASRN helps a DQN algorithm training process reach better results in an end to end autonomous driving task using the AirSim driving simulator.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1905.10144

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar

Drori, Iddo, Krishnamurthy, Yamuna, Lourenco, Raoni, Rampin, Remi, Cho, Kyunghyun, Silva, Claudio, Freire, Juliana

arXiv.org Machine LearningMay-24-2019

Automatic machine learning is an important problem in the forefront of machine learning. The strongest AutoML systems are based on neural networks, evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached state-of-the-art results with an order of magnitude speedup using reinforcement learning with self-play. In this work we extend AlphaD3M by using a pipeline grammar and a pre-trained model which generalizes from many different datasets and similar tasks. Our results demonstrate improved performance compared with our earlier work and existing methods on AutoML benchmark datasets for classification and regression tasks. In the spirit of reproducible research we make our data, models, and code publicly available.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1905.10345

Country: North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback