Reinforcement Learning
Context-Aware Symptom Checking for Disease Diagnosis Using Hierarchical Reinforcement Learning
Kao, Hao-Cheng (HTC Research) | Tang, Kai-Fu (HTC Research) | Chang, Edward Y. (HTC Research)
Online symptom checkers have been deployed by sites such as WebMD and Mayo Clinic to identify possible causes and treatments for diseases based on a patientโs symptoms. Symptom checking first assesses a patient by asking a series of questions about their symptoms, then attempts to predict potential diseases. The two design goals of a symptom checker are to achieve high accuracy and intuitive interactions. In this paper we present our context-aware hierarchical reinforcement learning scheme, which significantly improves accuracy of symptom checking over traditional systems while also making a limited number of inquiries.
A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents
Wu, Yueh-Hua (National Taiwan University) | Lin, Shou-De (National Taiwan University)
This paper proposes a low-cost, easily realizable strategy to equip a reinforcement learning (RL) agent the capability of behaving ethically. Our model allows the designers of RL agents to solely focus on the task to achieve, without having to worry about the implementation of multiple trivial ethical patterns to follow. Based on the assumption that the majority of human behavior, regardless which goals they are achieving, is ethical, our design integrates human policy with the RL policy to achieve the target objective with less chance of violating the ethical code that human beings normally obey.
Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces
Warnell, Garrett (U.S. Army Research Laboratory) | Waytowich, Nicholas (U.S. Army Research Laboratory) | Lawhern, Vernon (U.S. Army Research Laboratory) | Stone, Peter (The University of Texas at Austin)
While recent advances in deep reinforcement learning have allowed autonomous learning agents to succeed at a variety of complex tasks, existing algorithms generally require a lot oftraining data. One way to increase the speed at which agent sare able to learn to perform tasks is by leveraging the input of human trainers. Although such input can take many forms, real-time, scalar-valued feedback is especially useful in situations where it proves difficult or impossible for humans to provide expert demonstrations. Previous approaches have shown the usefulness of human input provided in this fashion (e.g., the TAMER framework), but they have thus far not considered high-dimensional state spaces or employed the use of deep learning. In this paper, we do both: we propose DeepTAMER, an extension of the TAMER framework that leverages the representational power of deep neural networks inorder to learn complex tasks in just a short amount of time with a human trainer. We demonstrate Deep TAMERโs success by using it and just 15 minutes of human-provided feedback to train an agent that performs better than humans on the Atari game of Bowling - a task that has proven difficult for even state-of-the-art reinforcement learning methods.
Toward Deep Reinforcement Learning Without a Simulator: An Autonomous Steering Example
Hilleli, Bar (Technion) | El-Yaniv, Ran (Technion)
We propose a scheme for training a computerized agent to perform complex human tasks such as highway steering. The scheme is designed to follow a natural learning process whereby a human instructor teaches a computerized trainee. It enables leveraging the weak supervision abilities of a (human) instructor, who, while unable to perform well herself at the required task, can provide coherent and learnable instantaneous reward signals to the computerized trainee. The learning process consists of three supervised elements followed by reinforcement learning. The supervised learning stages are: (i) supervised imitation learning; (ii) supervised reward induction; and (iii) supervised safety module construction. We implemented this scheme using deep convolutional networks and applied it to successfully create a computerized agent capable of autonomous highway steering over the well-known racing game Assetto Corsa. We demonstrate that the use of all components is essential to effectively carry out reinforcement learning of the steering task using vision alone, without access to a driving simulator internals, and operating in wall-clock time.
Reinforcement Mechanism Design for Fraudulent Behaviour in e-Commerce
Cai, Qingpeng (Tsinghua University) | Filos-Ratsikas, Aris (University of Oxford) | Tang, Pingzhong (Tsinghua University) | Zhang, Yiwei (UC Berkeley)
In large e-commerce websites, sellers have been observed to engage in fraudulent behaviour, faking historical transactions in order to receive favourable treatment from the platforms, specifically through the allocation of additional buyer impressions which results in higher revenue for them, but not for the system as a whole. This emergent phenomenon has attracted considerable attention, with previous approaches focusing on trying to detect illicit practices and to punish the miscreants. In this paper, we employ the principles of reinforcement mechanism design, a framework that combines the fundamental goals of classical mechanism design, i.e. the consideration of agents' incentives and their alignment with the objectives of the designer, with deep reinforcement learning for optimizing the performance based on these incentives. In particular, first we set up a deep-learning framework for predicting the sellers' rationality, based on real data from any allocation algorithm. We use data from one of largest e-commerce platforms worldwide and train a neural network model to predict the extent to which the sellers will engage in fraudulent behaviour. Using this rationality model, we employ an algorithm based on deep reinforcement learning to optimize the objectives and compare its performance against several natural heuristics, including the platform's implementation and incentive-based mechanisms from the related literature.
DyETC: Dynamic Electronic Toll Collection for Traffic Congestion Alleviation
Chen, Haipeng (Nanyang Technological University) | An, Bo (Nanyang Technological University) | Sharon, Guni (University of Texas at Austin) | Hanna, Josiah P. (University of Texas at Austin) | Stone, Peter (University of Texas at Austin) | Miao, Chunyan (Nanyang Technological University) | Soh, Yeng Chai (Nanyang Technological University)
To alleviate traffic congestion in urban areas, electronic toll collection (ETC) systems are deployed all over the world. Despite the merits, tolls are usually pre-determined and fixed from day to day, which fail to consider traffic dynamics and thus have limited regulation effect when traffic conditions are abnormal. In this paper, we propose a novel dynamic ETC (DyETC) scheme which adjusts tolls to traffic conditions in realtime. The DyETC problem is formulated as a Markov decision process (MDP), the solution of which is very challenging due to its 1) multi-dimensional state space, 2) multi-dimensional, continuous and bounded action space, and 3) time-dependent state and action values. Due to the complexity of the formulated MDP, existing methods cannot be applied to our problem. Therefore, we develop a novel algorithm, PG-beta, which makes three improvements to traditional policy gradient method by proposing 1) time-dependent value and policy functions, 2) Beta distribution policy function and 3) state abstraction. Experimental results show that, compared with existing ETC schemes, DyETC increases traffic volume by around 8%, and reduces travel time by around 14:6% during rush hour. Considering the total traffic volume in a traffic network, this contributes to a substantial increase to social welfare.
Exploring Implicit Feedback for Open Domain Conversation Generation
Zhang, Wei-Nan (Harbin Institute of Technology) | Li, Lingzhi (Harbin Institute of Technology) | Cao, Dongyan (Harbin Institute of Technology) | Liu, Ting (Harbin Institute of Technology)
User feedback can be an effective indicator to the success of the human-robot conversation. However, to avoid to interrupt the online real-time conversation process, explicit feedback is usually gained at the end of a conversation. Alternatively, users' responses usually contain their implicit feedback, such as stance, sentiment, emotion, etc., towards the conversation content or the interlocutors. Therefore, exploring the implicit feedback is a natural way to optimize the conversation generation process. In this paper, we propose a novel reward function which explores the implicit feedback to optimize the future reward of a reinforcement learning based neural conversation model. A simulation strategy is applied to explore the state-action space in training and test. Experimental results show that the proposed approach outperforms the Seq2Seq model and the state-of-the-art reinforcement learning model for conversation generation on automatic and human evaluations on the OpenSubtitles and Twitter datasets.
An Information-Theoretic Optimality Principle for Deep Reinforcement Learning
Leibfried, Felix, Grau-Moya, Jordi, Bou-Ammar, Haitham
We methodologically address the problem of Q-value overestimation in deep reinforcement learning to handle high-dimensional state spaces efficiently. By adapting concepts from information theory, we introduce an intrinsic penalty signal encouraging reduced Q-value estimates. The resultant algorithm encompasses a wide range of learning outcomes containing deep Q-networks as a special case. Different learning outcomes can be demonstrated by tuning a Lagrange multiplier accordingly. We furthermore propose a novel scheduling scheme for this Lagrange multiplier to ensure efficient and robust learning. In experiments on Atari games, our algorithm outperforms other algorithms (e.g.
[D] Inductive bias of reinforcement learning ? โข r/MachineLearning
In (deep) RL, lots of inductive biases are already built-in. Other inductive biases that I believe (and many RL/AI researchers have argued) should be there include "curiosity" and "representation at object / event level", although successful attempts so far are restricted to symbolic representations, AFAIK.
Machine learning explained: Understanding supervised, unsupervised, and reinforcement learning
Once we start delving into the concepts behind Artificial Intelligence (AI) and Machine Learning (ML), we come across copious amounts of jargon related to this field of study. Understanding this jargon and how it can have an impact on the study related to ML goes a long way in comprehending the study that has been conducted by researchers and data scientists to get AI to the state it now is. In this article, I will be providing you with a comprehensive definition of supervised, unsupervised and reinforcement learning in the broader field of Machine Learning. You must have encountered these terms while hovering over articles pertaining to the progress made in AI and the role played by ML in propelling this success forward. Understanding these concepts is a given fact, and should not be compromised at any cost.