AITopics

2006.06923

Country: Asia > China > Tianjin Province > Tianjin (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-11-2020

Robustness to Adversarial Attacks in Learning-Enabled Controllers

Xiong, Zikang, Eappen, Joe, Zhu, He, Jagannathan, Suresh

Learning-enabled controllers used in cyber-physical systems (CPS) are known to be susceptible to adversarial attacks. Such attacks manifest as perturbations to the states generated by the controller's environment in response to its actions. We consider state perturbations that encompass a wide variety of adversarial attacks and describe an attack scheme for discovering adversarial states. To be useful, these attacks need to be natural, yielding states in which the controller can be reasonably expected to generate a meaningful response. We consider shield-based defenses as a means to improve controller robustness in the face of such perturbations. Our defense strategy allows us to treat the controller and environment as black-boxes with unknown dynamics. We provide a two-stage approach to construct this defense and show its effectiveness through a range of experiments on realistic continuous control domains such as the navigation control-loop of an F16 aircraft and the motion control system of humanoid robots.

machine learning, reinforcement learning, trajectory, (19 more...)

2006.06861

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.46)

arXiv.org Machine LearningJun-11-2020

Learning to Solve Combinatorial Optimization Problems on Real-World Graphs in Linear Time

Drori, Iddo, Kharkar, Anant, Sickinger, William R., Kates, Brandon, Ma, Qiang, Ge, Suwen, Dolev, Eden, Dietrich, Brenda, Williamson, David P., Udell, Madeleine

Combinatorial optimization algorithms for graph problems are usually designed afresh for each new problem with careful attention by an expert to the problem structure. In this work, we develop a new framework to solve any combinatorial optimization problem over graphs that can be formulated as a single player game defined by states, actions, and rewards, including minimum spanning tree, shortest paths, traveling salesman problem, and vehicle routing problem, without expert knowledge. Our method trains a graph neural network using reinforcement learning on an unlabeled training set of graphs. The trained network then outputs approximate solutions to new graph instances in linear running time. In contrast, previous approximation algorithms or heuristics tailored to NP-hard problems on graphs generally have at least quadratic running time. We demonstrate the applicability of our approach on both polynomial and NP-hard problems with optimality gaps close to 1, and show that our method is able to generalize well: (i) from training on small graphs to testing on large graphs; (ii) from training on random graphs of one type to testing on random graphs of another type; and (iii) from training on random graphs to running on real world graphs.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2006.0375

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.67)
Transportation (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningJun-11-2020

Exploration by Maximizing R\'enyi Entropy for Zero-Shot Meta RL

Zhang, Chuheng, Cai, Yuanying, Huang, Longbo, Li, Jian

Exploring the transition dynamics is essential to the success of reinforcement learning (RL) algorithms. To face the challenges of exploration, we consider a zero-shot meta RL framework that completely separates exploration from exploitation and is suitable for the meta RL setting where there are many reward functions of interest. In the exploration phase, the agent learns an exploratory policy by interacting with a reward-free environment and collects a dataset of transitions by executing the policy. In the planning phase, the agent computes a good policy for any reward function based on the dataset without further interacting with the environment. This framework brings new challenges for exploration algorithms. In the exploration phase, we propose to maximize the R\'enyi entropy over the state-action space and justify this objective theoretically. We further deduce a policy gradient formulation for this objective and design a practical exploration algorithm that can deal with complex environments based on PPO. In the planning phase, we use a batch RL algorithm, batch constrained deep Q-learning (BCQ), to solve for good policies given arbitrary reward functions. Empirically, we show that our exploration algorithm is effective and sample efficient, and results in superior policies for arbitrary reward functions in the planning phase.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2006.06193

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kostrikov, Ilya, Yarats, Denis, Fergus, Rob

Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

arXiv.org Machine LearningJun-11-2020

We propose a simple data augmentation technique that can be applied to standard model-free reinforcement learning algorithms, enabling robust learning directly from pixels without the need for auxiliary losses or pre-training. The approach leverages input perturbations commonly used in computer vision tasks to regularize the value function. Existing model-free approaches, such as Soft Actor-Critic (SAC), are not able to train deep networks effectively from image pixels. However, the addition of our augmentation method dramatically improves SAC's performance, enabling it to reach state-of-the-art performance on the DeepMind control suite, surpassing model-based (Dreamer, PlaNet, and SLAC) methods and recently proposed contrastive learning (CURL). Our approach can be combined with any model-free reinforcement learning algorithm, requiring only minor modifications. An implementation can be found at https://sites.google.com/view/data-regularized-q.

environment step, machine learning, reinforcement learning, (15 more...)

2004.13649

Country:

North America > United States > New York (0.04)
North America > United States > Indiana (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Xin, Xin, Karatzoglou, Alexandros, Arapakis, Ioannis, Jose, Joemon M.

Self-Supervised Reinforcement Learning for Recommender Systems

arXiv.org Artificial IntelligenceJun-11-2020

In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major component of RL approaches is to train the agent through interactions with the environment. However, it is often problematic to train a recommender in an on-line fashion due to the requirement to expose users to irrelevant recommendations. As a result, learning the policy from logged implicit feedback is of vital importance, which is challenging due to the pure off-policy setting and lack of negative rewards (feedback). In this paper, we propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. The RL part acts as a regularizer to drive the supervised layer focusing on specific rewards(e.g., recommending items which may lead to purchases rather than clicks) while the self-supervised layer with cross-entropy loss provides strong gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC). We integrate the proposed frameworks with four state-of-the-art recommendation models. Experimental results on two real-world datasets demonstrate the effectiveness of our approach.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2006.05779

Genre: Research Report (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceJun-10-2020, 06:48:35 GMT

5 Best Reinforcement Learning Courses - DZone AI

A team of global experts compiled this list of best reinforcement courses, classes, tutorials, training, and certification programs available online. This list includes both free and paid courses to help you learn reinforcement learning. Also, it is ideal for beginners, intermediates, and experts. Offered by the University of Alberta, this reinforcement learning specialization program consists of four different courses that will help you explore the power of adaptive learning systems and artificial intelligence. In this program, you will learn how reinforcement learning solutions can help you solve real-world problems via trial-and-error interaction by implementing a complete RL solution from beginning to end.

machine learning, reinforcement, reinforcement learning, (15 more...)

#artificialintelligence

Country: North America > Canada > Alberta (0.59)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education > Educational Setting (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Wang, Tianyu, Dhiman, Vikas, Atanasov, Nikolay

Learning Navigation Costs from Demonstration with Semantic Observations

arXiv.org Machine LearningJun-10-2020

This paper focuses on inverse reinforcement learning (IRL) for autonomous robot navigation using semantic observations. The objective is to infer a cost function that explains demonstrated behavior while relying only on the expert's observations and state-control trajectory. We develop a map encoder, which infers semantic class probabilities from the observation sequence, and a cost encoder, defined as deep neural network over the semantic features. Since the expert cost is not directly observable, the representation parameters can only be optimized by differentiating the error between demonstrated controls and a control policy computed from the cost estimate. The error is optimized using a closed-form subgradient computed only over a subset of promising states via a motion planning algorithm. We show that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of cars, sidewalks and road lanes.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2006.05043

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
Africa > Togo (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.48)
Information Technology > Robotics & Automation (0.34)
Automobiles & Trucks (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Gimelfarb, Michael, Sanner, Scott, Lee, Chi-Guhn

Bayesian Experience Reuse for Learning from Multiple Demonstrators

arXiv.org Machine LearningJun-10-2020

Learning from demonstrations (LfD) improves the exploration efficiency of a learning agent by incorporating demonstrations from experts. However, demonstration data can often come from multiple experts with conflicting goals, making it difficult to incorporate safely and effectively in online settings. We address this problem in the static and dynamic optimization settings by modelling the uncertainty in source and target task functions using normal-inverse-gamma priors, whose corresponding posteriors are, respectively, learned from demonstrations and target data using Bayesian neural networks with shared features. We use this learned belief to derive a quadratic programming problem whose solution yields a probability distribution over the expert models. Finally, we propose Bayesian Experience Reuse (BERS) to sample demonstrations in accordance with this distribution and reuse them directly in new tasks. We demonstrate the effectiveness of this approach for static optimization of smooth functions, and transfer learning in a high-dimensional supply chain problem with cost uncertainty.

demonstration, machine learning, reinforcement learning, (14 more...)

2006.05725

Country:

North America > Canada > Ontario > Toronto (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
(2 more...)

Théate, Thibaut, Ernst, Damien

An Application of Deep Reinforcement Learning to Algorithmic Trading

arXiv.org Artificial IntelligenceJun-10-2020

This scientific research paper presents an innovative approach based on deep reinforcement learning (DRL) to solve the algorithmic trading problem of determining the optimal trading position at any point in time during a trading activity in stock markets. It proposes a novel DRL trading strategy so as to maximise the resulting Sharpe ratio performance indicator on a broad range of stock markets. Denominated the Trading Deep Q-Network algorithm (TDQN), this new trading strategy is inspired from the popular DQN algorithm and significantly adapted to the specific algorithmic trading problem at hand. The training of the resulting reinforcement learning (RL) agent is entirely based on the generation of artificial trajectories from a limited set of stock market historical data. In order to objectively assess the performance of trading strategies, the research paper also proposes a novel, more rigorous performance assessment methodology. Following this new performance assessment approach, promising results are reported for the TDQN strategy.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2004.06627

Country:

Asia > China (0.04)
Europe > Belgium > Wallonia > Liège Province > Liège (0.04)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)