Reinforcement Learning
Automated Quantification of CT Patterns Associated with COVID-19 from Chest CT
To present a method that automatically segments and quantifies abnormal CT patterns commonly present in coronavirus disease 2019 (COVID-19), namely ground glass opacities and consolidations. In this retrospective study, the proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions, based on a dataset of 9749 chest CT volumes. The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities, based on deep learning and deep reinforcement learning. The first measure of (PO, PHO) is global, while the second of (LSS, LHOS) is lobe-wise. Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States collected between 2002-Present (April 2020).
Deep Reinforcement Learning 2.0
Online Courses Udemy - Deep Reinforcement Learning 2.0, The smartest combination of Deep Q-Learning, Policy Gradient, Actor Critic, and DDPG Created by Hadelin de Ponteves, Kirill Eremenko, SuperDataScience Team English [Auto] Students also bought Unsupervised Deep Learning in Python Deep Learning: Advanced Computer Vision (GANs, SSD, More!) Data Science: Natural Language Processing (NLP) in Python Recommender Systems and Deep Learning in Python Cutting-Edge AI: Deep Reinforcement Learning in Python Ensemble Machine Learning in Python: Random Forest, AdaBoost Preview this course GET COUPON CODE Description Welcome to Deep Reinforcement Learning 2.0! In this course, we will learn and implement a new incredibly smart AI model, called the Twin-Delayed DDPG, which combines state of the art techniques in Artificial Intelligence including continuous Double Deep Q-Learning, Policy Gradient, and Actor Critic. The model is so strong that for the first time in our courses, we are able to solve the most challenging virtual AI applications (training an ant/spider and a half humanoid to walk and run across a field). To approach this model the right way, we structured the course in three parts: Part 1: Fundamentals In this part we will study all the fundamentals of Artificial Intelligence which will allow you to understand and master the AI of this course. These include Q-Learning, Deep Q-Learning, Policy Gradient, Actor-Critic and more.
Deep Reinforcement Learning for Field Development Optimization
The field development optimization (FDO) problem represents a challenging mixed-integer nonlinear programming (MINLP) problem in which we seek to obtain the number of wells, their type, location, and drilling sequence that maximizes an economic metric. Evolutionary optimization algorithms have been effectively applied to solve the FDO problem, however, these methods provide only a deterministic (single) solution which are generally not robust towards small changes in the problem setup. In this work, the goal is to apply convolutional neural network-based (CNN) deep reinforcement learning (DRL) algorithms to the field development optimization problem in order to obtain a policy that maps from different states or representation of the underlying geological model to optimal decisions. The proximal policy optimization (PPO) algorithm is considered with two CNN architectures of varying number of layers and composition. Both networks obtained policies that provide satisfactory results when compared to a hybrid particle swarm optimization - mesh adaptive direct search (PSO-MADS) algorithm that has been shown to be effective at solving the FDO problem.
Follow the Object: Curriculum Learning for Manipulation Tasks with Imagined Goals
Kilinc, Ozsel, Montana, Giovanni
Learning robot manipulation through deep reinforcement learning in environments with sparse rewards is a challenging task. In this paper we address this problem by introducing a notion of imaginary object goals. For a given manipulation task, the object of interest is first trained to reach a desired target position on its own, without being manipulated, through physically realistic simulations. The object policy is then leveraged to build a predictive model of plausible object trajectories providing the robot with a curriculum of incrementally more difficult object goals to reach during training. The proposed algorithm, Follow the Object (FO), has been evaluated on 7 MuJoCo environments requiring increasing degree of exploration, and has achieved higher success rates compared to alternative algorithms.
Robust Deep Reinforcement Learning through Adversarial Loss
Oikarinen, Tuomas, Weng, Tsui-Wei, Daniel, Luca
Deep neural networks, including reinforcement learning agents, have been proven vulnerable to small adversarial changes in the input, thus making deploying such networks in the real world problematic. In this paper, we propose RADIAL-RL, a method to train reinforcement learning agents with improved robustness against any $l_p$-bounded adversarial attack. By simply minimizing an upper bound of the loss functions under worst case adversarial perturbation derived from efficient robustness verification methods, we significantly improve robustness of RL-agents trained on Atari-2600 games and show that RADIAL-RL can beat state-of-the-art robust training algorithms when evaluated against PGD-attacks. We also propose a new evaluation method, Greedy Worst-Case Reward (GWC), for measuring attack agnostic robustness of RL agents. GWC can be evaluated efficiently and it serves as a good estimate of the reward under the worst possible sequence of adversarial attacks; in particular, GWC accounts for the importance of each action and their temporal dependency, improving upon previous approaches that only evaluate whether each single action can change under input perturbations. Our code is available at https://github.com/tuomaso/radial_rl.
Reinforcement Learning-driven Information Seeking: A Quantum Probabilistic Approach
Jaiswal, Amit Kumar, Liu, Haiming, Frommholz, Ingo
Understanding an information forager's actions during interaction is very important for the study of interactive information retrieval. Although information spread in uncertain information space is substantially complex due to the high entanglement of users interacting with information objects (text, image, etc.). However, an information forager, in general, accompanies a piece of information (information diet) while searching (or foraging) alternative contents, typically subject to decisive uncertainty. Such types of uncertainty are analogous to measurements in quantum mechanics which follow the uncertainty principle. In this paper, we discuss information seeking as a reinforcement learning task. We then present a reinforcement learning-based framework to model forager exploration that treats the information forager as an agent to guide their behaviour. Also, our framework incorporates the inherent uncertainty of the foragers' action using the mathematical formalism of quantum mechanics.
Optimizing AD Pruning of Sponsored Search with Reinforcement Learning
Lian, Yijiang, Chen, Zhijie, Pei, Xin, Li, Shuang, Wang, Yifei, Qiu, Yuefeng, Zhang, Zhiheng, Tao, Zhipeng, Yuan, Liang, Guan, Hanju, Zhang, Kefeng, Li, Zhigang, Liu, Xiaochun
Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue.
EasyRL: A Simple and Extensible Reinforcement Learning Framework
Hulbert, Neil, Spillers, Sam, Francis, Brandon, Haines-Temons, James, Romero, Ken Gil, De Jager, Benjamin, Wong, Sam, Flora, Kevin, Huang, Bowei, Irissappane, Athirai A.
In recent years, Reinforcement Learning (RL), has become a popular field of study as well as a tool for enterprises working on cutting-edge artificial intelligence research. To this end, many researchers have built RL frameworks such as openAI Gym and KerasRL for ease of use. While these works have made great strides towards bringing down the barrier of entry for those new to RL, we propose a much simpler framework called EasyRL, by providing an interactive graphical user interface for users to train and evaluate RL agents. As it is entirely graphical, EasyRL does not require programming knowledge for training and testing simple built-in RL agents. EasyRL also supports custom RL agents and environments, which can be highly beneficial for RL researchers in evaluating and comparing their RL models.
An Imitation from Observation Approach to Sim-to-Real Transfer
Desai, Siddarth, Durugkar, Ishan, Karnan, Haresh, Warnell, Garrett, Hanna, Josiah, Stone, Peter
The sim to real transfer problem deals with leveraging large amounts of inexpensive simulation experience to help artificial agents learn behaviors intended for the real world more efficiently. One approach to sim-to-real transfer is using interactions with the real world to make the simulator more realistic, called grounded sim to-real transfer. In this paper, we show that a particular grounded sim-to-real approach, grounded action transformation, is closely related to the problem of imitation from observation IfO, learning behaviors that mimic the observations of behavior demonstrations. After establishing this relationship, we hypothesize that recent state-of-the-art approaches from the IfO literature can be effectively repurposed for such grounded sim-to-real transfer. To validate our hypothesis we derive a new sim-to-real transfer algorithm - generative adversarial reinforced action transformation (GARAT) - based on adversarial imitation from observation techniques. We run experiments in several simulation domains with mismatched dynamics, and find that agents trained with GARAT achieve higher returns in the real world compared to existing black-box sim-to-real methods
Multiple Code Hashing for Efficient Image Retrieval
Li, Ming-Wei, Jiang, Qing-Yuan, Li, Wu-Jun
Due to its low storage cost and fast query speed, hashing has been widely used in large-scale image retrieval tasks. Hash bucket search returns data points within a given Hamming radius to each query, which can enable search at a constant or sub-linear time cost. However, existing hashing methods cannot achieve satisfactory retrieval performance for hash bucket search in complex scenarios, since they learn only one hash code for each image. More specifically, by using one hash code to represent one image, existing methods might fail to put similar image pairs to the buckets with a small Hamming distance to the query when the semantic information of images is complex. As a result, a large number of hash buckets need to be visited for retrieving similar images, based on the learned codes. This will deteriorate the efficiency of hash bucket search. In this paper, we propose a novel hashing framework, called multiple code hashing (MCH), to improve the performance of hash bucket search. The main idea of MCH is to learn multiple hash codes for each image, with each code representing a different region of the image. Furthermore, we propose a deep reinforcement learning algorithm to learn the parameters in MCH. To the best of our knowledge, this is the first work that proposes to learn multiple hash codes for each image in image retrieval. Experiments demonstrate that MCH can achieve a significant improvement in hash bucket search, compared with existing methods that learn only one hash code for each image.