AITopics

Zhang, Shangtong, Liu, Bo, Whiteson, Shimon

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

arXiv.org Artificial IntelligenceMay-27-2020

We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP. MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf, in both on- and off-policy settings. We propose risk-averse TD3 as an example instantiating MVPI, which outperforms vanilla TD3 and many previous risk-averse control methods in challenging Mujoco robot simulation tasks under a risk-aware performance metric. This risk-averse TD3 is the first to introduce deterministic policies and off-policy learning into risk-averse reinforcement learning, both of which are key to the performance boost we show in Mujoco domains. MVPI adopts a per-step reward perspective (Bisi et al., 2019) for risk-averse control, instead of the commonly used total reward perspective.

machine learning, reinforcement learning, variance, (14 more...)

2004.10888

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Veith, Eric MSP, Wenninghoff, Nils, Frost, Emilie

The Adversarial Resilience Learning Architecture for AI-based Modelling, Exploration, and Operation of Complex Cyber-Physical Systems

arXiv.org Artificial IntelligenceMay-27-2020

Modern algorithms in the domain of Deep Reinforcement Learning (DRL) demonstrated remarkable successes; most widely known are those in game-based scenarios, from ATARI video games to Go and the StarCraft~\textsc{II} real-time strategy game. However, applications in the domain of modern Cyber-Physical Systems (CPS) that take advantage a vast variety of DRL algorithms are few. We assume that the benefits would be considerable: Modern CPS have become increasingly complex and evolved beyond traditional methods of modelling and analysis. At the same time, these CPS are confronted with an increasing amount of stochastic inputs, from volatile energy sources in power grids to broad user participation stemming from markets. Approaches of system modelling that use techniques from the domain of Artificial Intelligence (AI) do not focus on analysis and operation. In this paper, we describe the concept of Adversarial Resilience Learning (ARL) that formulates a new approach to complex environment checking and resilient operation: It defines two agent classes, attacker and defender agents. The quintessence of ARL lies in both agents exploring the system and training each other without any domain knowledge. Here, we introduce the ARL software architecture that allows to use a wide range of model-free as well as model-based DRL-based algorithms, and document results of concrete experiment runs on a complex power grid.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2005.13601

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games

Chen, Hao, Wang, Chang, Huang, Jian, Gong, Jianxing

In Markov games, playing against non-stationary opponents with learning ability is still challenging for reinforcement learning (RL) agents, because the opponents can evolve their policies concurrently. This increases the complexity of the learning task and slows down the learning speed of the RL agents. This paper proposes efficient use of rough heuristics to speed up policy learning when playing against concurrent learners. Specifically, we propose an algorithm that can efficiently learn explainable and generalized action selection rules by taking advantages of the representation of quantitative heuristics and an opponent model with an eXtended classifier system (XCS) in zero-sum Markov games. A neural network is used to model the opponent from their behaviors and the corresponding policy is inferred for action selection and rule evolution. In cases of multiple heuristic policies, we introduce the concept of Pareto optimality for action selection. Besides, taking advantages of the condition representation and matching mechanism of XCS, the heuristic policies and the opponent model can provide guidance for situations with similar feature representation. Furthermore, we introduce an accuracy-based eligibility trace mechanism to speed up rule evolution, i.e., classifiers that can match the historical traces are reinforced according to their accuracy. We demonstrate the advantages of the proposed algorithm over several benchmark algorithms in a soccer and a thief-and-hunter scenarios.

classifier, machine learning, reinforcement learning, (18 more...)

2005.12553

Country:

Asia > China (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Michigan (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.67)
Leisure & Entertainment > Sports > Soccer (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Bellinger, Colin, Coles, Rory, Crowley, Mark, Tamblyn, Isaac

Active Measure Reinforcement Learning for Observation Cost Minimization

Standard reinforcement learning (RL) algorithms assume that the observation of the next state comes instantaneously and at no cost. In a wide variety of sequential decision making tasks ranging from medical treatment to scientific discovery, however, multiple classes of state observations are possible, each of which has an associated cost. We propose the active measure RL framework (Amrl) as an initial solution to this problem where the agent learns to maximize the costed return, which we define as the discounted sum of rewards minus the sum of observation costs. Our empirical evaluation demonstrates that Amrl-Q agents are able to learn a policy and state estimator in parallel during online training. During training the agent naturally shifts from its reliance on costly measurements of the environment to its state estimator in order to increase its reward. It does this without harm to the learned policy. Our results show that the Amrl-Q agent learns at a rate similar to standard Q-learning and Dyna-Q. Critically, by utilizing an active strategy, Amrl-Q achieves a higher costed return.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2005.12697

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
North America > Canada > British Columbia > Vancouver Island > Capital Regional District > Victoria (0.14)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Education > Educational Setting > Online (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zennaro, Fabio Massimo, Erdodi, Laszlo

Modeling Penetration Testing with Reinforcement Learning Using Capture-the-Flag Challenges and Tabular Q-Learning

Penetration testing is a security exercise aimed at assessing the security of a system by simulating attacks against it. So far, penetration testing has been carried out mainly by trained human attackers and its success critically depended on the available expertise. Automating this practice constitutes a non-trivial problem, as the range of actions that a human expert may attempts against a system and the range of knowledge she relies on to take her decisions are hard to capture. In this paper, we focus our attention on simplified penetration testing problems expressed in the form of capture the flag hacking challenges, and we apply reinforcement learning algorithms to try to solve them. In modelling these capture the flag competitions as reinforcement learning problems we highlight the specific challenges that characterize penetration testing. We observe these challenges experimentally across a set of varied simulations, and we study how different reinforcement learning techniques may help us addressing these challenges. In this way we show the feasibility of tackling penetration testing using reinforcement learning, and we highlight the challenges that must be taken into consideration, and possible directions to solve them.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2005.12632

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Lee, Michelle A., Florensa, Carlos, Tremblay, Jonathan, Ratliff, Nathan, Garg, Animesh, Ramos, Fabio, Fox, Dieter

Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interactions with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show that a locally learned-policy can be used directly with raw sensory inputs. We test our algorithm, Guided Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing peg insertion. Videos are available at https://sites.google.com/view/guapo-rl

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2005.10872

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Sanandaji, Anahita, Ghanbartehrani, Saeed, Mokhtari, Zahra, Tajik, Kimia

A Novel Ramp Metering Approach Based on Machine Learning and Historical Data

arXiv.org Machine LearningMay-26-2020

The random nature of traffic conditions on freeways can cause excessive congestions and irregularities in the traffic flow. Ramp metering is a proven effective method to maintain freeway efficiency under various traffic conditions. Creating a reliable and practical ramp metering algorithm that considers both critical traffic measures and historical data is still a challenging problem. In this study we use machine learning approaches to develop a novel real-time prediction model for ramp metering. We evaluate the potentials of our approach in providing promising results by comparing it with a baseline traffic-responsive ramp metering algorithm.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

doi: 10.3390/make2040021

2005.13992

Country:

North America > United States > Oregon (0.05)
North America > United States > Texas (0.05)
North America > United States > Ohio (0.05)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report (0.70)

Industry:

Transportation > Ground > Road (0.96)
Consumer Products & Services > Travel (0.78)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

arXiv.org Machine LearningMay-26-2020

Towards intervention-centric causal reasoning in learning agents

Lansdell, Benjamin

Interventions are central to causal learning and reasoning. Yet ultimately an intervention is an abstraction: an agent embedded in a physical environment (perhaps modeled as a Markov decision process) does not typically come equipped with the notion of an intervention -- its action space is typically ego-centric, without actions of the form `intervene on X'. Such a correspondence between ego-centric actions and interventions would be challenging to hard-code. It would instead be better if an agent learnt which sequence of actions allow it to make targeted manipulations of the environment, and learnt corresponding representations that permitted learning from observation. Here we show how a meta-learning approach can be used to perform causal learning in this challenging setting, where the action-space is not a set of interventions and the observation space is a high-dimensional space with a latent causal structure. A meta-reinforcement learning algorithm is used to learn relationships that transfer on observational causal learning tasks. This work shows how advances in deep reinforcement learning and meta-learning can provide intervention-centric causal learning in high-dimensional environments with a latent causal structure.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2005.12968

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

Liu, Jianfeng, Pan, Feiyang, Luo, Ling

A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets to reach the goals. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training chatbots to maximize the longterm return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy guides the conversation towards the final goal by determining some sub-goals, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.

machine learning, natural language, reinforcement learning, (16 more...)

2005.11729

Country:

Asia > China (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)