AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters

Peng, Yanghua, Bao, Yixin, Chen, Yangrui, Wu, Chuan, Meng, Chen, Lin, Wei

arXiv.org Machine LearningSep-13-2019

More and more companies have deployed machine learning (ML) clusters, where deep learning (DL) models are trained for providing various AI-driven services. Efficient resource scheduling is essential for maximal utilization of expensive DL clusters. Existing cluster schedulers either are agnostic to ML workload characteristics, or use scheduling heuristics based on operators' understanding of particular ML framework and workload, which are less efficient or not general enough. In this paper, we show that DL techniques can be adopted to design a generic and efficient scheduler. DL2 is a DL-driven scheduler for DL clusters, targeting global training job expedition by dynamically resizing resources allocated to jobs. DL2 advocates a joint supervised learning and reinforcement learning approach: a neural network is warmed up via offline supervised learning based on job traces produced by the existing cluster scheduler; then the neural network is plugged into the live DL cluster, fine-tuned by reinforcement learning carried out throughout the training progress of the DL jobs, and used for deciding job resource allocation in an online fashion. By applying past decisions made by the existing cluster scheduler in the preparatory supervised learning phase, our approach enables a smooth transition from existing scheduler, and renders a high-quality scheduler in minimizing average training completion time. We implement DL2 on Kubernetes and enable dynamic resource scaling in DL jobs on MXNet. Extensive evaluation shows that DL2 outperforms fairness scheduler (i.e., DRF) by 44.1% and expert heuristic scheduler (i.e., Optimus) by 17.5% in terms of average job completion time.

machine learning, reinforcement learning, scheduler, (17 more...)

arXiv.org Machine Learning

1909.0604

Genre: Research Report (0.82)

Industry:

Education > Educational Setting (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards an Adaptive Robot for Sports and Rehabilitation Coaching

Ross, Martin K., Broz, Frank, Baillie, Lynne

arXiv.org Artificial IntelligenceSep-13-2019

The work presented in this paper aims to explore how, and to what extent, an adaptive robotic coach has the potential to provide extra motivation to adhere to long-term rehabilitation and help fill the coaching gap which occurs during repetitive solo practice in high performance sport. Adapting the behavior of a social robot to a specific user, using reinforcement learning (RL), could be a way of increasing adherence to an exercise routine in both domains. The requirements gathering phase is underway and is presented in this paper along with the rationale of using RL in this context.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1909.08052

Country: North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Consumer Health (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Flight Controller Synthesis Via Deep Reinforcement Learning

Koch, William

arXiv.org Artificial IntelligenceSep-13-2019

Traditional control methods are inadequate in many deployment settings involving control of Cyber-Physical Systems (CPS). In such settings, CPS controllers must operate and respond to unpredictable interactions, conditions, or failure modes. Dealing with such unpredictability requires the use of executive and cognitive control functions that allow for planning and reasoning. Motivated by the sport of drone racing, this dissertation addresses these concerns for state-of-the-art flight control by investigating the use of deep neural networks to bring essential elements of higher-level cognition for constructing low level flight controllers. This thesis reports on the development and release of an open source, full solution stack for building neuro-flight controllers. This stack consists of the methodology for constructing a multicopter digital twin for synthesize the flight controller unique to a specific aircraft, a tuning framework for implementing training environments (GymFC), and a firmware for the world's first neural network supported flight controller (Neuroflight). GymFC's novel approach fuses together the digital twinning paradigm for flight control training to provide seamless transfer to hardware. Additionally, this thesis examines alternative reward system functions as well as changes to the software environment to bridge the gap between the simulation and real world deployment environments. Work summarized in this thesis demonstrates that reinforcement learning is able to be leveraged for training neural network controllers capable, not only of maintaining stable flight, but also precision aerobatic maneuvers in real world settings. As such, this work provides a foundation for developing the next generation of flight control systems.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

1909.06493

Country: North America > United States > Massachusetts (0.27)

Genre:

Summary/Review (1.00)
Research Report > New Finding (0.92)

Industry:

Transportation > Air (1.00)
Government > Military > Air Force (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Petri Net Machines for Human-Agent Interaction

Dondrup, Christian, Papaioannou, Ioannis, Lemon, Oliver

arXiv.org Artificial IntelligenceSep-13-2019

Smart speakers and robots become ever more prevalent in our daily lives. These agents are able to execute a wide range of tasks and actions and, therefore, need systems to control their execution. Current state-of-the-art such as (deep) reinforcement learning, however, requires vast amounts of data for training which is often hard to come by when interacting with humans. To overcome this issue, most systems still rely on Finite State Machines. We introduce Petri Net Machines which present a formal definition for state machines based on Petri Nets that are able to execute concurrent actions reliably, execute and interleave several plans at the same time, and provide an easy to use modelling language. We show their workings based on the example of Human-Robot Interaction in a shopping mall.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1909.06174

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

DataWorkshop Club Conf 2019 Machine Learning Conference Online

#artificialintelligenceSep-12-2019, 20:54:19 GMT

Recent years have seen a rising interest in developing AI algorithms for real world big data domains ranging from autonomous cars to personalized assistants. At the core of these algorithms are architectures that combine deep neural networks, for approximating the underlying multidimensional state-spaces, with reinforcement learning, for controlling agents that learn to operate in said state-spaces towards achieving a given objective. The talk will first outline notable past and future efforts in deep reinforcement learning as well as identify fundamental problems that this technology has been struggling to overcome. Towards mitigating these problems (and open up an alternative path to general artificial intelligence), I will then summarize a brain computing model of intelligence, rooted in the latest findings in neuroscience. The talk will conclude with an overview of the recent research efforts in the field of multi-agent systems, to provide the future teams of humans and agents with the necessary tools that allow them to safely co-exist.

artificial intelligence, machine learning conference online, reinforcement learning, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

UC Berkeley Looks into AI Deep Learning and Reinforcement Learning for the Naval Info Warfare Center – Signal Magazine – IAM Network

#artificialintelligenceSep-12-2019, 15:53:44 GMT

california, deep learning, reinforcement learning, (2 more...)

#artificialintelligence

Country: North America > United States > California > Alameda County > Berkeley (0.65)

Industry: Education > Educational Setting > Higher Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Stock Prediction Bot using Deep Reinforcement Learning (LSTM) 91-7307399944 for query

#artificialintelligenceSep-12-2019, 12:26:13 GMT

Direct at 91-7307399944 Whats App at 91-7307399944 If you liked the video then you can also promote us. Your small contribution will encourage us to keep contributing to Educational AI. Please make a small donation via Paypal at queryajayjatav@gmail.com

artificial intelligence, deep learning, machine learning, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

Reinforcement Learning for Portfolio Management

Filos, Angelos

arXiv.org Machine LearningSep-12-2019

T raditionally, mathematical formulations of dynamical systems in the context of Signal Processing and Control Theory have been a lynchpin of today's Financial Engineering. More recently, advances in sequential decision making, mainly through the concept of Reinforcement Learning, have been instrumental in the development of multistage stochastic optimization, a key component in sequential portfolio optimization (asset allocation) strategies. In this thesis, we develop a comprehensive account of the expressive power, modelling efficiency, and performance advantages of so called trading agents (i.e., Deep Soft Recurrent Q-Network (DSRQN) and Mixture of Score Machines (MSM)), based on both traditional system identification (model-based approach) as well as on context-independent agents (model-free approach). The analysis provides a conclusive support for the ability of model-free reinforcement learning methods to act as universal trading agents, which are not only capable of reducing the computational and memory complexity (owing to their linear scaling with size of the universe), but also serve as generalizing strategies across assets and markets, regardless of the trading universe on which they have been trained. The relatively low volume of daily returns in financial market data is addressed via data augmentation (a generative approach) and a choice of pre-training strategies, both of which are validated against current state-of-the-art models. For rigour, a risk-sensitive framework which includes transaction costs is considered, and its performance advantages are demonstrated in a variety of scenarios, from synthetic time-series (sinusoidal, sawtooth and chirp waves), ii simulated market series (surrogate data based), through to real market data (S&P 500 and EURO STOXX 50). The analysis and simulations confirm the superiority of universal model-free reinforcement learning agents over current portfolio management model in asset allocation strategies, with the achieved performance advantage of as much as 9.2% in annualized cumulative returns and 13.4% in annualized Sharpe Ratio.

agent, deep learning, upstream oil & gas, (24 more...)

arXiv.org Machine Learning

1909.09571

Country:

North America > United States (0.27)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia (0.14)

Genre:

Research Report (0.69)
Instructional Material (0.45)

Industry:

Banking & Finance > Trading (1.00)
Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Efficiently Breaking the Curse of Horizon: Double Reinforcement Learning in Infinite-Horizon Processes

Kallus, Nathan, Uehara, Masatoshi

arXiv.org Machine LearningSep-12-2019

Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long- and infinite-horizon settings due to diminishing overlap between behavior and target policies. In this paper, we study the role of Markovian, time-invariant, and ergodic structure in efficient OPE. We first derive the efficiency limits for OPE when one assumes each of these structures. This precisely characterizes the curse of horizon: in time-variant processes, OPE is only feasible in the near-on-policy setting, where behavior and target policies are sufficiently similar. But, in ergodic time-invariant Markov decision processes, our bounds show that truly-off-policy evaluation is feasible, even with only just one dependent trajectory, and provide the limits of how well we could hope to do. We develop a new estimator based on Double Reinforcement Learning (DRL) that leverages this structure for OPE. Our DRL estimator simultaneously uses estimated stationary density ratios and $q$-functions and remains efficient when both are estimated at slow, nonparametric rates and remains consistent when either is estimated consistently. We investigate these properties and the performance benefits of leveraging the problem structure for more efficient OPE.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1909.0585

Country: North America > United States > New York (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

Cowan, Wesley, Katehakis, Michael N., Pirutinsky, Daniel

arXiv.org Artificial IntelligenceSep-12-2019

In this paper we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area and we compare the performance of the classic UCB policy of \cc{bkmdp97} with a new policy developed herein which we call MDP-Deterministic Minimum Empirical Divergence (MDP-DMED), and a method based on Posterior sampling (MDP-PS).

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

1909.06019

Country: North America > United States (0.68)

Genre:

Overview (0.89)
Research Report (0.83)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback