AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Autonomous Penetration Testing using Reinforcement Learning

Schwartz, Jonathon, Kurniawati, Hanna

arXiv.org Artificial IntelligenceMay-15-2019

Penetration testing (pentesting) involves performing a controlled attack on a computer system in order to assess it's security. Although an effective method for testing security, pentesting requires highly skilled practitioners and currently there is a growing shortage of skilled cyber security professionals. One avenue for alleviating this problem is automate the pentesting process using artificial intelligence techniques. Current approaches to automated pentesting have relied on model-based planning, however the cyber security landscape is rapidly changing making maintaining up-to-date models of exploits a challenge. This project investigated the application of model-free Reinforcement Learning (RL) to automated pentesting. Model-free RL has the key advantage over model-based planning of not requiring a model of the environment, instead learning the best policy through interaction with the environment. We first designed and built a fast, low compute simulator for training and testing autonomous pentesting agents. We did this by framing pentesting as a Markov Decision Process with the known configuration of the network as states, the available scans and exploits as actions, the reward determined by the value of machines on the network. We then used this simulator to investigate the application of model-free RL to pentesting. We tested the standard Q-learning algorithm using both tabular and neural network based implementations. We found that within the simulated environment both tabular and neural network implementations were able to find optimal attack paths for a range of different network topologies and sizes without having a model of action behaviour. However, the implemented algorithms were only practical for smaller networks and numbers of actions. Further work is needed in developing scalable RL algorithms and testing these algorithms in larger and higher fidelity environments.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

1905.05965

Country:

North America > United States (0.14)
Oceania > Australia > Queensland (0.04)

Genre: Research Report > Promising Solution (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Synthesis of Provably Correct Autonomy Protocols for Shared Control

Cubuktepe, Murat, Jansen, Nils, Alsiekh, Mohammed, Topcu, Ufuk

arXiv.org Artificial IntelligenceMay-15-2019

We synthesize shared control protocols subject to probabilistic temporal logic specifications. More specifically, we develop a framework in which a human and an autonomy protocol can issue commands to carry out a certain task. We blend these commands into a joint input to a robot. We model the interaction between the human and the robot as a Markov decision process (MDP) that represents the shared control scenario. Using inverse reinforcement learning, we obtain an abstraction of the human's behavior and decisions. We use randomized strategies to account for randomness in human's decisions, caused by factors such as complexity of the task specifications or imperfect interfaces. We design the autonomy protocol to ensure that the resulting robot behavior satisfies given safety and performance specifications in probabilistic temporal logic. Additionally, the resulting strategies generate behavior as similar to the behavior induced by the human's commands as possible. We solve the underlying problem efficiently using quasiconvex programming. Case studies involving autonomous wheelchair navigation and unmanned aerial vehicle mission planning showcase the applicability of our approach.

machine learning, reinforcement learning, specification, (19 more...)

arXiv.org Artificial Intelligence

1905.06471

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
North America > United States > Pennsylvania (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Aerospace & Defense (0.49)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

A Contextual-Based Framework for Opinion Formation

Santos, Eugene (Dartmouth College) | Nyanhongo, Clement (Dartmouth College)

AAAI ConferencesMay-15-2019

During opinion formation, interacting agents can be assumed to be engaging in learning and decision-making processes to satisfy their individual goals. These goals are determined by the agents’ preferences – which are often unknown, complex and unpredictable. Most opinion formation frameworks however, assume static preferences and fail to model practical situations where human preferences change. We propose a new framework to simulate the process of opinion formation under uncertainty and dynamism. Agents who are unaware of their implicit con-textual preferences utilize inverse reinforcement learning to compute reward functions that determines their preferences. Reinforcement learning is subsequently used to optimize the agents’ behavior and satisfy their individual goals. The novelty of our approach lies in its ability to capture uncertainty and dynamism in the agent’s preferences, which are assumed to be unknown initially. This framework is compared to a baseline method based on reinforcement learning, and results show its ability to per-form better under dynamic scenarios.

agent, reward function, trajectory, (16 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country: North America > United States > New Hampshire > Grafton County > Hanover (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Explaining Reward Functions in Markov Decision Processes

Russell, Jacob (Dartmouth College) | Santos, Eugene (Dartmouth College)

AAAI ConferencesMay-15-2019

Rewards in Markov Decision Processes (MDP) define the behavior of the model. Without a clear interpretation of what the reward function is and is not capturing, one cannot trust their model nor diagnose when the model is giving incorrect recommendations. Increasing complexity of state-of-the-art models used to represent the reward function and model-free methods that attempt to avoid representing this function make trusting the model much more difficult. We map these reward functions onto a standard classification problem where we can explain what factors the model considers in making decisions in local and global contexts and quantify whether the fit of the reward function is likely to be good for explaining the behavior of the model. We evaluate our proof-of-concept on both the standard version and a modified version of the Object World domain to add more nonlinearity.

explanation, global model, reward function, (14 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Hampshire > Grafton County > Hanover (0.04)
Asia > Malaysia (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

A Deep Reinforcement Learn-Based FIFA Agent with Naive State Representations and Portable Connection Interfaces

Faria, Matheus Prado Prandini (Federal University of Uberlândi) | Julia, Rita Maria Silva (Federal University of Uberlândi) | Tomaz, Lídia Bononi Paiva (Federal University of Triângulo Mineiroi)

AAAI ConferencesMay-15-2019

Video games have proved to be a very defying laboratory to study machine-learning techniques, such as Deep Reinforcement Learning (DRL) algorithms. This paper presents a new approach for a DRL-based agent trained through Deep Q-Network (DQN) technique to perform free kicks in FIFA 18 game. The main motivation for choosing this case study is the fact that, like in many situations of the real life, FIFA represents a stochastic environment. Coping with this task, the main contributions of the present paper consist on: inspired on the OpenAI Gym and on the OpenAI Universe platforms, implementing a new user-friendly interface (in terms of portability and use simplicity) to connect the learning module with the 3D FIFA's game environment; implementing a DRL-based agent for free kicks in FIFA that uses two distinct data representations retrieved from lower cost computational procedures. The results were validated through two evaluative parameters: score of well succeed kicks and training time.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Summary/Review (0.34)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

Generative NNI Transformation Strategies in Binary Trees Using Reinforcement Learning

Shirvani, Shirin (University of Texas at Arlington) | Huber, Manfred (University of Texas at Arlington)

AAAI ConferencesMay-15-2019

Learning strategies to address problems on graph and tree structures with no a-priori size limitations in cases where no known solution exists (and thus supervised data is hard to obtain), is a difficult problem with potential applications in a wide range of domains ranging from biological networks to protein folding and social network search. The main challenges here arise from the variable size representation that needs to be resolved in the context of Reinforcement Learning (RL) to address the problem. In this paper we consider a common, specific tree problem and show that it can be addressed using a combination of feature engineering and carefully designed learning processes. In particular, We consider the classical Nearest Neighbor Interchange (NNI) distance between unrooted labeled trees, which is defined as the minimum-cost sequence of operations that transform one tree into another. We introduce a representation and a reinforcement learning method that learns the transition dynamics and iteratively changes an arbitrary initial labeled tree into a goal configuration reachable through NNI. The differential tree representation and NNI actions permits the system to learn a strategy that is applicable to arbitrary sized trees. To train the system, we introduce a training process that uses randomly sampled trajectories to incrementally train more and more complex problems to overcome the difficulty of the overall strategy space. Experiments performed show that the system can successfully learn a strategy for effective NNI on complex trees.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country: North America > United States > Texas > Tarrant County > Arlington (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention

Prakash, Bharat (University of Maryland, Baltimore County) | Khatwani, Mohit (University of Maryland, Baltimore County) | Waytowich, Nicholas (US Army Research Laboratory) | Mohsenin, Tinoosh (University of Maryland, Baltimore County)

AAAI ConferencesMay-15-2019

Recent progress in AI and Reinforcement learning has shown great success in solving complex problems with high dimensional state spaces. However, most of these successes have been primarily in simulated environments where failure is of little or no consequence. Most real-world applications, however, require training solutions that are safe to operate as catastrophic failures are inadmissible especially when there is human interaction involved. Currently, Safe RL systems use human oversight during training and exploration in order to make sure the RL agent does not go into a catastrophic state. These methods require a large amount of human labor and it is very difficult to scale up. We present a hybrid method for reducing the human intervention time by combining model-based approaches and training a supervised learner to to improve sample efficiency while also ensuring safety. We evaluate these methods on various grid-world environments using both standard and visual representations and show that our approach achieves better performance in terms of sample efficiency, number of catastrophic states reached as well as overall task performance compared to traditional model-free approaches.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country:

North America > United States > Massachusetts (0.04)
North America > United States > Maryland > Baltimore County (0.04)
North America > United States > Maryland > Baltimore (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Meta reinforcement learning as task inference

Humplik, Jan, Galashov, Alexandre, Hasenclever, Leonard, Ortega, Pedro A., Teh, Yee Whye, Heess, Nicolas

arXiv.org Machine LearningMay-15-2019

Humans achieve efficient learning by relying on prior knowledge about the structure of naturally occurring tasks. There has been considerable interest in designing reinforcement learning algorithms with similar properties. This includes several proposals to learn the learning algorithm itself, an idea also referred to as meta learning. One formal interpretation of this idea is in terms of a partially observable multi-task reinforcement learning problem in which information about the task is hidden from the agent. Although agents that solve partially observable environments can be trained from rewards alone, shaping an agent's memory with additional supervision has been shown to boost learning efficiency. It is thus natural to ask what kind of supervision, if any, facilitates meta-learning. Here we explore several choices and develop an architecture that separates learning of the belief about the unknown task from learning of the policy, and that can be used effectively with privileged information about the task during training. We show that this approach can be very effective at solving standard meta-RL environments, as well as a complex continuous control environment in which a simulated robot has to execute various movement sequences.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1905.06424

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Reinforcement Learning for Robotics and Control with Active Uncertainty Reduction

Patwardhan, Narendra, Wang, Zequn

arXiv.org Machine LearningMay-15-2019

Model-free reinforcement learning based methods such as Proximal Policy Optimization, or Q-learning typically require thousands of interactions with the environment to approximate the optimum controller which may not always be feasible in robotics due to safety and time consumption. Model-based methods such as PILCO or BlackDrops, while data-efficient, provide solutions with limited robustness and complexity. To address this tradeoff, we introduce active uncertainty reduction-based virtual environments, which are formed through limited trials conducted in the original environment. We provide an efficient method for uncertainty management, which is used as a metric for self-improvement by identification of the points with maximum expected improvement through adaptive sampling. Capturing the uncertainty also allows for better mimicking of the reward responses of the original system. Our approach enables the use of complex policy structures and reward functions through a unique combination of model-based and model-free methods, while still retaining the data efficiency. We demonstrate the validity of our method on several classic reinforcement learning problems in OpenAI gym. We prove that our approach offers a better modeling capacity for complex system dynamics as compared to established methods.

machine learning, reinforcement learning, virtual environment, (19 more...)

arXiv.org Machine Learning

1905.06274

Country: North America (0.28)

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning

Wainwright, Martin J.

arXiv.org Machine LearningMay-15-2019

Motivated by the study of $Q$-learning algorithms in reinforcement learning, we study a class of stochastic approximation procedures based on operators that satisfy monotonicity and quasi-contractivity conditions with respect to an underlying cone. We prove a general sandwich relation on the iterate error at each time, and use it to derive non-asymptotic bounds on the error in terms of a cone-induced gauge norm. These results are derived within a deterministic framework, requiring no assumptions on the noise. We illustrate these general bounds in application to synchronous $Q$-learning for discounted Markov decision processes with discrete state-action spaces, in particular by deriving non-asymptotic bounds on the $\ell_\infty$-norm for a range of stepsizes. These results are the sharpest known to date, and we show via simulation that the dependence of our bounds cannot be improved in a worst-case sense. These results show that relative to a model-based $Q$-iteration, the $\ell_\infty$-based sample complexity of $Q$-learning is suboptimal in terms of the discount factor $\gamma$.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1905.06265

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback