Goto

Collaborating Authors

 Reinforcement Learning


Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey

arXiv.org Machine Learning

Owe to the recent advancements in Artificial Intelligence especially deep learning, many data-driven decision support systems have been implemented to facilitate medical doctors in delivering personalized care. We focus on the deep reinforcement learning (DRL) models in this paper. DRL models have demonstrated human-level or even superior performance in the tasks of computer vision and game playings, such as Go and Atari game. However, the adoption of deep reinforcement learning techniques in clinical decision optimization is still rare. We here present the first survey that summarizes reinforcement learning algorithms with Deep Neural Networks (DNN) on clinical decision support. We also discuss some case studies, where different DRL algorithms were applied to address various clinical challenges. We further compare and contrast the advantages and limitations of various DRL algorithms and present a preliminary guide on how to choose the appropriate DRL algorithm for particular clinical applications.


Efficient Policy Learning for Non-Stationary MDPs under Adversarial Manipulation

arXiv.org Machine Learning

A Markov Decision Process (MDP) is a popular model for reinforcement learning. However, its commonly used assumption of stationary dynamics and rewards is too stringent and fails to hold in adversarial, nonstationary, or multi-agent problems. We study an episodic setting where the parameters of an MDP can differ across episodes. We learn a reliable policy of this potentially adversarial MDP by developing an Adversarial Reinforcement Learning (ARL) algorithm that reduces our MDP to a sequence of \emph{adversarial} bandit problems. ARL achieves $O(\sqrt{SATH^3})$ regret, which is optimal with respect to $S$, $A$, and $T$, and its dependence on $H$ is the best (even for the usual stationary MDP) among existing model-free methods.


Model-free Control of Chaos with Continuous Deep Q-learning

arXiv.org Machine Learning

The OGY method is one of control methods for a chaotic system. In the method, we have to calculate a stabilizing periodic orbit embedded in its chaotic attractor. Thus, we cannot use this method in the case where a precise mathematical model of the chaotic system cannot be identified. In this case, the delayed feedback control proposed by Pyragas is useful. However, even in the delayed feedback control, we need the mathematical model to determine a feedback gain that stabilizes the periodic orbit. To overcome this problem, we propose a model-free reinforcement learning algorithm to the design of a controller for the chaotic system. In recent years, model-free reinforcement learning algorithms with deep neural networks have been paid much attention to. Those algorithms make it possible to control complex systems. However, it is known that model-free reinforcement learning algorithms are not efficient because learners must explore their control policies over the entire state space. Moreover, model-free reinforcement learning algorithms with deep neural networks have the disadvantage in taking much time to learn their control optimal policies. Thus, we propose a data-based control policy consisting of two steps, where we determine a region including the stabilizing periodic orbit first, and make the controller learn an optimal control policy for its stabilization. In the proposed method, the controller efficiently explores its control policy only in the region.


VRLS: A Unified Reinforcement Learning Scheduler for Vehicle-to-Vehicle Communications

arXiv.org Artificial Intelligence

Vehicle-to-vehicle (V2V) communications have distinct challenges that need to be taken into account when scheduling the radio resources. Although centralized schedulers (e.g., located on base stations) could be utilized to deliver high scheduling performance, they cannot be employed in case of coverage gaps. To address the issue of reliable scheduling of V2V transmissions out of coverage, we propose Vehicular Reinforcement Learning Scheduler (VRLS), a centralized scheduler that predictively assigns the resources for V2V communication while the vehicle is still in cellular network coverage. VRLS is a unified reinforcement learning (RL) solution, wherein the learning agent, the state representation, and the reward provided to the agent are applicable to different vehicular environments of interest (in terms of vehicular density, resource configuration, and wireless channel conditions). Such a unified solution eliminates the necessity of redesigning the RL components for a different environment, and facilitates transfer learning from one to another similar environment. We evaluate the performance of VRLS and show its ability to avoid collisions and half-duplex errors, and to reuse the resources better than the state of the art scheduling algorithms. We also show that pre-trained VRLS agent can adapt to different V2V environments with limited retraining, thus enabling real-world deployment in different scenarios.


Deep Reinforcement Learning for Autonomous Internet of Things: Model, Applications and Challenges

arXiv.org Machine Learning

The Internet of Things (IoT) extends the Internet connectivity into billions of IoT devices around the world, which collect and share information to reflect the status of physical world. The Autonomous Control System (ACS), on the other hand, performs control functions on the physical systems without external intervention over an extended period of time. The integration of IoT and ACS results in a new concept - autonomous IoT (AIoT). The sensors collect information on the system status, based on which intelligent agents in IoT devices as well as Edge/Fog/Cloud servers make control decisions for the actuators to react. In order to achieve autonomy, a promising method is for the intelligent agents to leverage the techniques in the field of artificial intelligence, especially reinforcement learning (RL) and deep reinforcement learning (DRL) for decision making. In this paper, we first provide comprehensive survey of the state-of-art research, and then propose a general model for the applications of RL/DRL in AIoT. Finally, the challenges and open issues for future research are identified.


A Learning-Based Two-Stage Spectrum Sharing Strategy with Multiple Primary Transmit Power Levels

arXiv.org Machine Learning

Multi-parameter cognition in a cognitive radio network (CRN) provides a more thorough understanding of the radio environments, and could potentially lead to far more intelligent and efficient spectrum usage for a secondary user. In this paper, we investigate the multi-parameter cognition problem for a CRN where the primary transmitter (PT) radiates multiple transmit power levels, and propose a learning-based two-stage spectrum sharing strategy. We first propose a data-driven/machine learning based multi-level spectrum sensing scheme, including the spectrum learning (Stage I) and prediction (the first part in Stage II). This fully blind sensing scheme does not require any prior knowledge of the PT power characteristics. Then, based on a novel normalized power level alignment metric, we propose two prediction-transmission structures, namely periodic and non-periodic, for spectrum access (the second part in Stage II), which enable the secondary transmitter (ST) to closely follow the PT power level variation. The periodic structure features a fixed prediction interval, while the non-periodic one dynamically determines the interval with a proposed reinforcement learning algorithm to further improve the alignment metric. Finally, we extend the prediction-transmission structure to an online scenario, where the number of PT power levels might change as a consequence of PT adapting to the environment fluctuation or quality of service variation. The simulation results demonstrate the effectiveness of the proposed strategy in various scenarios.


Characterizing Attacks on Deep Reinforcement Learning

arXiv.org Machine Learning

Deep reinforcement learning (DRL) has achieved great success in various applications. However, recent studies show that machine learning models are vulnerable to adversarial attacks. DRL models have been attacked by adding perturbations to observations. While such observation based attack is only one aspect of potential attacks on DRL, other forms of attacks which are more practical require further analysis, such as manipulating environment dynamics. Therefore, we propose to understand the vulnerabilities of DRL from various perspectives and provide a thorough taxonomy of potential attacks. We conduct the first set of experiments on the unexplored parts within the taxonomy. In addition to current observation based attacks against DRL, we propose the first targeted attacks based on action space and environment dynamics. We also introduce the online sequential attacks based on temporal consistency information among frames. To better estimate gradient in black-box setting, we propose a sampling strategy and theoretically prove its efficiency and estimation error bound. We conduct extensive experiments to compare the effectiveness of different attacks with several baselines in various environments, including game playing, robotics control, and autonomous driving.


Arena: a toolkit for Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

We introduce Arena, a toolkit for multi-agent reinforcement learning (MARL) research. In MARL, it usually requires customizing observations, rewards and actions for each agent, changing cooperative-competitive agent-interaction, and playing with/against a third-party agent, etc. We provide a novel modular design, called Interface, for manipulating such routines in essentially two ways: 1) Different interfaces can be concatenated and combined, which extends the OpenAI Gym Wrappers concept to MARL scenarios. 2) During MARL training or testing, interfaces can be embedded in either wrapped OpenAI Gym compatible Environments or raw environment compatible Agents. We offer off-the-shelf interfaces for several popular MARL platforms, including StarCraft II, Pommerman, ViZDoom, Soccer, etc. The interfaces effectively support self-play RL and cooperative-competitive hybrid MARL. Also, Arena can be conveniently extended to your own favorite MARL platform.


Potential-Based Advice for Stochastic Policy Learning

arXiv.org Artificial Intelligence

This paper augments the reward received by a reinforcement learning agent with potential functions in order to help the agent learn (possibly stochastic) optimal policies. We show that a potential-based reward shaping scheme is able to preserve optimality of stochastic policies, and demonstrate that the ability of an agent to learn an optimal policy is not affected when this scheme is augmented to soft Q-learning. We propose a method to impart potential based advice schemes to policy gradient algorithms. An algorithm that considers an advantage actor-critic architecture augmented with this scheme is proposed, and we give guarantees on its convergence. Finally, we evaluate our approach on a puddle-jump grid world with indistinguishable states, and the continuous state and action mountain car environment from classical control. Our results indicate that these schemes allow the agent to learn a stochastic optimal policy faster and obtain a higher average reward.


Using deep learning to improve traffic signal performance Penn State University

#artificialintelligence

Traffic signals serve to regulate the worst bottlenecks in highly populated areas but are not always very effective. Researchers at Penn State are hoping to use deep reinforcement learning to improve traffic signal efficiency in urban areas, thanks to a one-year, $22,443 Penn State Institute for CyberScience Seed Grant. Urban traffic congestion currently costs the U.S. economy $160 billion in lost productivity and causes 3.1 billion gallons of wasted fuel and 56 billion pounds of harmful CO2 emissions, according to the 2015 Urban Mobility Scorecard. Vikash Gayah, associate professor of civil engineering, and Zhenhui "Jessie" Li, associate professor of information sciences and technology, aim to tackle this issue by first identifying machine learning algorithms that will provide results consistent with traditional (theoretical) solutions for simple scenerios, and then building upon those algorithms by introducing complexities that cannot be readily addressed through traditional means. "Typically, we would go out and do traffic counts for an hour at certain peak times of day and that would determine signal timings for the next year, but not every day looks like that hour, and so we get inefficiency," Gayah said.