AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey

Liu, Siqi, Ngiam, Kee Yuan, Feng, Mengling

arXiv.org Machine LearningJul-22-2019

Owe to the recent advancements in Artificial Intelligence especially deep learning, many data-driven decision support systems have been implemented to facilitate medical doctors in delivering personalized care. We focus on the deep reinforcement learning (DRL) models in this paper. DRL models have demonstrated human-level or even superior performance in the tasks of computer vision and game playings, such as Go and Atari game. However, the adoption of deep reinforcement learning techniques in clinical decision optimization is still rare. We here present the first survey that summarizes reinforcement learning algorithms with Deep Neural Networks (DNN) on clinical decision support. We also discuss some case studies, where different DRL algorithms were applied to address various clinical challenges. We further compare and contrast the advantages and limitations of various DRL algorithms and present a preliminary guide on how to choose the appropriate DRL algorithm for particular clinical applications.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1907.09475

Country:

Asia > Singapore (0.04)
North America > United States > Maryland > Montgomery County > Bethesda (0.04)
Europe > Sweden > Skåne County > Malmö (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Leisure & Entertainment > Games > Computer Games (0.87)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Efficient Policy Learning for Non-Stationary MDPs under Adversarial Manipulation

Yu, Tiancheng, Sra, Suvrit

arXiv.org Machine LearningJul-22-2019

A Markov Decision Process (MDP) is a popular model for reinforcement learning. However, its commonly used assumption of stationary dynamics and rewards is too stringent and fails to hold in adversarial, nonstationary, or multi-agent problems. We study an episodic setting where the parameters of an MDP can differ across episodes. We learn a reliable policy of this potentially adversarial MDP by developing an Adversarial Reinforcement Learning (ARL) algorithm that reduces our MDP to a sequence of \emph{adversarial} bandit problems. ARL achieves $O(\sqrt{SATH^3})$ regret, which is optimal with respect to $S$, $A$, and $T$, and its dependence on $H$ is the best (even for the usual stationary MDP) among existing model-free methods.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1907.0935

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Model-free Control of Chaos with Continuous Deep Q-learning

Ikemoto, Junya, Ushio, Toshimitsu

arXiv.org Machine LearningJul-22-2019

The OGY method is one of control methods for a chaotic system. In the method, we have to calculate a stabilizing periodic orbit embedded in its chaotic attractor. Thus, we cannot use this method in the case where a precise mathematical model of the chaotic system cannot be identified. In this case, the delayed feedback control proposed by Pyragas is useful. However, even in the delayed feedback control, we need the mathematical model to determine a feedback gain that stabilizes the periodic orbit. To overcome this problem, we propose a model-free reinforcement learning algorithm to the design of a controller for the chaotic system. In recent years, model-free reinforcement learning algorithms with deep neural networks have been paid much attention to. Those algorithms make it possible to control complex systems. However, it is known that model-free reinforcement learning algorithms are not efficient because learners must explore their control policies over the entire state space. Moreover, model-free reinforcement learning algorithms with deep neural networks have the disadvantage in taking much time to learn their control optimal policies. Thus, we propose a data-based control policy consisting of two steps, where we determine a region including the stabilizing periodic orbit first, and make the controller learn an optimal control policy for its stabilization. In the proposed method, the controller efficiently explores its control policy only in the region.

deep neural network, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1907.07775

Country: Asia > Japan (0.15)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

VRLS: A Unified Reinforcement Learning Scheduler for Vehicle-to-Vehicle Communications

Şahin, Taylan, Khalili, Ramin, Boban, Mate, Wolisz, Adam

arXiv.org Artificial IntelligenceJul-22-2019

Vehicle-to-vehicle (V2V) communications have distinct challenges that need to be taken into account when scheduling the radio resources. Although centralized schedulers (e.g., located on base stations) could be utilized to deliver high scheduling performance, they cannot be employed in case of coverage gaps. To address the issue of reliable scheduling of V2V transmissions out of coverage, we propose Vehicular Reinforcement Learning Scheduler (VRLS), a centralized scheduler that predictively assigns the resources for V2V communication while the vehicle is still in cellular network coverage. VRLS is a unified reinforcement learning (RL) solution, wherein the learning agent, the state representation, and the reward provided to the agent are applicable to different vehicular environments of interest (in terms of vehicular density, resource configuration, and wireless channel conditions). Such a unified solution eliminates the necessity of redesigning the RL components for a different environment, and facilitates transfer learning from one to another similar environment. We evaluate the performance of VRLS and show its ability to avoid collisions and half-duplex errors, and to reuse the resources better than the state of the art scheduling algorithms. We also show that pre-trained VRLS agent can adapt to different V2V environments with limited retraining, thus enabling real-world deployment in different scenarios.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

1907.09319

Country: Europe > Germany (0.46)

Genre: Research Report (1.00)

Industry:

Telecommunications (1.00)
Transportation > Ground > Road (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Reinforcement Learning for Autonomous Internet of Things: Model, Applications and Challenges

Lei, Lei, Tan, Yue, Liu, Shiwen, Zheng, Kan, Xuemin, null, Shen, null

arXiv.org Machine LearningJul-21-2019

The Internet of Things (IoT) extends the Internet connectivity into billions of IoT devices around the world, which collect and share information to reflect the status of physical world. The Autonomous Control System (ACS), on the other hand, performs control functions on the physical systems without external intervention over an extended period of time. The integration of IoT and ACS results in a new concept - autonomous IoT (AIoT). The sensors collect information on the system status, based on which intelligent agents in IoT devices as well as Edge/Fog/Cloud servers make control decisions for the actuators to react. In order to achieve autonomy, a promising method is for the intelligent agents to leverage the techniques in the field of artificial intelligence, especially reinforcement learning (RL) and deep reinforcement learning (DRL) for decision making. In this paper, we first provide comprehensive survey of the state-of-art research, and then propose a general model for the applications of RL/DRL in AIoT. Finally, the challenges and open issues for future research are identified.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1907.09059

Genre:

Overview (1.00)
Research Report > Promising Solution (0.87)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Energy > Renewable (1.00)
Energy > Power Industry (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

A Learning-Based Two-Stage Spectrum Sharing Strategy with Multiple Primary Transmit Power Levels

Zhang, Rui, Cheng, Peng, Chen, Zhuo, Li, Yonghui, Vucetic, Branka

arXiv.org Machine LearningJul-21-2019

Multi-parameter cognition in a cognitive radio network (CRN) provides a more thorough understanding of the radio environments, and could potentially lead to far more intelligent and efficient spectrum usage for a secondary user. In this paper, we investigate the multi-parameter cognition problem for a CRN where the primary transmitter (PT) radiates multiple transmit power levels, and propose a learning-based two-stage spectrum sharing strategy. We first propose a data-driven/machine learning based multi-level spectrum sensing scheme, including the spectrum learning (Stage I) and prediction (the first part in Stage II). This fully blind sensing scheme does not require any prior knowledge of the PT power characteristics. Then, based on a novel normalized power level alignment metric, we propose two prediction-transmission structures, namely periodic and non-periodic, for spectrum access (the second part in Stage II), which enable the secondary transmitter (ST) to closely follow the PT power level variation. The periodic structure features a fixed prediction interval, while the non-periodic one dynamically determines the interval with a proposed reinforcement learning algorithm to further improve the alignment metric. Finally, we extend the prediction-transmission structure to an online scenario, where the number of PT power levels might change as a consequence of PT adapting to the environment fluctuation or quality of service variation. The simulation results demonstrate the effectiveness of the proposed strategy in various scenarios.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1907.09949

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Telecommunications (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Characterizing Attacks on Deep Reinforcement Learning

Xiao, Chaowei, Pan, Xinlei, He, Warren, Peng, Jian, Sun, Mingjie, Yi, Jinfeng, Li, Bo, Song, Dawn

arXiv.org Machine LearningJul-21-2019

Deep reinforcement learning (DRL) has achieved great success in various applications. However, recent studies show that machine learning models are vulnerable to adversarial attacks. DRL models have been attacked by adding perturbations to observations. While such observation based attack is only one aspect of potential attacks on DRL, other forms of attacks which are more practical require further analysis, such as manipulating environment dynamics. Therefore, we propose to understand the vulnerabilities of DRL from various perspectives and provide a thorough taxonomy of potential attacks. We conduct the first set of experiments on the unexplored parts within the taxonomy. In addition to current observation based attacks against DRL, we propose the first targeted attacks based on action space and environment dynamics. We also introduce the online sequential attacks based on temporal consistency information among frames. To better estimate gradient in black-box setting, we propose a sampling strategy and theoretically prove its efficiency and estimation error bound. We conduct extensive experiments to compare the effectiveness of different attacks with several baselines in various environments, including game playing, robotics control, and autonomous driving.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1907.0947

Country:

North America > United States > California (0.28)
North America > United States > Michigan (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Arena: a toolkit for Multi-Agent Reinforcement Learning

Wang, Qing, Xiong, Jiechao, Han, Lei, Fang, Meng, Sun, Xinghai, Zheng, Zhuobin, Sun, Peng, Zhang, Zhengyou

arXiv.org Artificial IntelligenceJul-20-2019

We introduce Arena, a toolkit for multi-agent reinforcement learning (MARL) research. In MARL, it usually requires customizing observations, rewards and actions for each agent, changing cooperative-competitive agent-interaction, and playing with/against a third-party agent, etc. We provide a novel modular design, called Interface, for manipulating such routines in essentially two ways: 1) Different interfaces can be concatenated and combined, which extends the OpenAI Gym Wrappers concept to MARL scenarios. 2) During MARL training or testing, interfaces can be embedded in either wrapped OpenAI Gym compatible Environments or raw environment compatible Agents. We offer off-the-shelf interfaces for several popular MARL platforms, including StarCraft II, Pommerman, ViZDoom, Soccer, etc. The interfaces effectively support self-play RL and cooperative-competitive hybrid MARL. Also, Arena can be conveniently extended to your own favorite MARL platform.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1907.09467

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Potential-Based Advice for Stochastic Policy Learning

Xiao, Baicen, Ramasubramanian, Bhaskar, Clark, Andrew, Hajishirzi, Hannaneh, Bushnell, Linda, Poovendran, Radha

arXiv.org Artificial IntelligenceJul-20-2019

This paper augments the reward received by a reinforcement learning agent with potential functions in order to help the agent learn (possibly stochastic) optimal policies. We show that a potential-based reward shaping scheme is able to preserve optimality of stochastic policies, and demonstrate that the ability of an agent to learn an optimal policy is not affected when this scheme is augmented to soft Q-learning. We propose a method to impart potential based advice schemes to policy gradient algorithms. An algorithm that considers an advantage actor-critic architecture augmented with this scheme is proposed, and we give guarantees on its convergence. Finally, we evaluate our approach on a puddle-jump grid world with indistinguishable states, and the continuous state and action mountain car environment from classical control. Our results indicate that these schemes allow the agent to learn a stochastic optimal policy faster and obtain a higher average reward.

machine learning, reinforcement learning, stochastic policy learning, (2 more...)

arXiv.org Artificial Intelligence

1907.08823

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Using deep learning to improve traffic signal performance Penn State University

#artificialintelligenceJul-19-2019, 23:40:31 GMT

Traffic signals serve to regulate the worst bottlenecks in highly populated areas but are not always very effective. Researchers at Penn State are hoping to use deep reinforcement learning to improve traffic signal efficiency in urban areas, thanks to a one-year, $22,443 Penn State Institute for CyberScience Seed Grant. Urban traffic congestion currently costs the U.S. economy $160 billion in lost productivity and causes 3.1 billion gallons of wasted fuel and 56 billion pounds of harmful CO2 emissions, according to the 2015 Urban Mobility Scorecard. Vikash Gayah, associate professor of civil engineering, and Zhenhui "Jessie" Li, associate professor of information sciences and technology, aim to tackle this issue by first identifying machine learning algorithms that will provide results consistent with traditional (theoretical) solutions for simple scenerios, and then building upon those algorithms by introducing complexities that cannot be readily addressed through traditional means. "Typically, we would go out and do traffic counts for an hour at certain peak times of day and that would determine signal timings for the next year, but not every day looks like that hour, and so we get inefficiency," Gayah said.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Country: North America > United States (0.51)

Industry:

Transportation > Infrastructure & Services (0.88)
Transportation > Ground > Road (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback