Goto

Collaborating Authors

 Reinforcement Learning


Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning

arXiv.org Machine Learning

Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term cost. A canonical approach for solving CMDPs is the primal-dual method which updates parameters in primal and dual spaces in turn. Existing methods for CMDPs only use on-policy data for dual updates, which results in sample inefficiency and slow convergence. In this paper, we propose a policy search method for CMDPs called Accelerated Primal-Dual Optimization (APDO), which incorporates an off-policy trained dual variable in the dual update procedure while updating the policy in primal space with on-policy likelihood ratio gradient. Experimental results on a simulated robot locomotion task show that APDO achieves better sample efficiency and faster convergence than state-of-the-art approaches for CMDPs.


Improving Mild Cognitive Impairment Prediction via Reinforcement Learning and Dialogue Simulation

arXiv.org Machine Learning

Mild cognitive impairment (MCI) is a prodromal phase in the progression from normal aging to dementia, especially Alzheimers disease. Even though there is mild cognitive decline in MCI patients, they have normal overall cognition and thus is challenging to distinguish from normal aging. Using transcribed data obtained from recorded conversational interactions between participants and trained interviewers, and applying supervised learning models to these data, a recent clinical trial has shown a promising result in differentiating MCI from normal aging. However, the substantial amount of interactions with medical staff can still incur significant medical care expenses in practice. In this paper, we propose a novel reinforcement learning (RL) framework to train an efficient dialogue agent on existing transcripts from clinical trials. Specifically, the agent is trained to sketch disease-specific lexical probability distribution, and thus to converse in a way that maximizes the diagnosis accuracy and minimizes the number of conversation turns. We evaluate the performance of the proposed reinforcement learning framework on the MCI diagnosis from a real clinical trial. The results show that while using only a few turns of conversation, our framework can significantly outperform state-of-the-art supervised learning approaches.


Sim-To-Real Optimization Of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

arXiv.org Machine Learning

Mobile network that millions of people use every day is one of the most complex systems in real world. Optimization of mobile network to meet exploding customer demand and reduce CAPEX/OPEX poses greater challenges than in prior works. Learning to solve complex problems in real world to benefit everyone and make the world better has long been ultimate goal of AI. However, it still remains an unsolved problem for deep reinforcement learning (DRL), given imperfect information in real world, huge state/action space, lots of data needed for training, associated time/cost, multi-agent interactions, potential negative impact to real world, etc. To bridge this reality gap, we proposed a DRL framework to direct transfer optimal policy learned from multi-tasks in source domain to unseen similar tasks in target domain without any further training in both domains. First, we distilled temporal-spatial relationships between cells and mobile users to scalable 3D image-like tensor to best characterize partially observed mobile network. Second, inspired by AlphaGo, we used a novel self-play mechanism to empower DRL agent to gradually improve its intelligence by competing for best record on multiple tasks. Third, a decentralized DRL method is proposed to coordinate multi-agents to compete and cooperate as a team to maximize global reward and minimize potential negative impact. Using 7693 unseen test tasks over 160 unseen simulated mobile networks and 6 field trials over 4 commercial mobile networks in real world, we demonstrated the capability of our approach to direct transfer the learning from one simulator to another simulator, and from simulation to real world. This is the first time that a DRL agent successfully transfers its learning directly from simulation to very complex real world problems with incomplete and imperfect information, huge state/action space and multi-agent interactions.


A Beginner's Guide to Deep Reinforcement Learning (for Java and Scala) - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM

#artificialintelligence

While neural networks are responsible for recent breakthroughs in problems like computer vision, machine translation and time series prediction โ€“ they can also combine with reinforcement learning algorithms to create something astounding like AlphaGo. Reinforcement learning refers to goal-oriented algorithms, which learn how to attain a complex objective (goal) or maximize along a particular dimension over many steps; for example, maximize the points won in a game over many moves. They can start from a blank slate, and under the right conditions they achieve superhuman performance. Like a child incentivized by spankings and candy, these algorithms are penalized when they make the wrong decisions and rewarded when they make the right ones โ€“ this is reinforcement. Reinforcement algorithms that incorporate deep learning can beat world champions at the game of Go as well as human experts playing numerous Atari video games.


Resurgence of AI During 1983-2010

#artificialintelligence

Every decade seems to have its technological buzzwords: we had personal computers in 1980s; Internet and worldwide web in 1990s; smart phones and social media in 2000s; and Artificial Intelligence (AI) and Machine Learning in this decade. The 1950-82 era saw a new field of Artificial Intelligence (AI) being born, lot of pioneering research being done, massive hype being created, and AI going into hibernation when this hype did not materialize, and the research funding dried up [56]. During 1983 and 2010, research funding ebbed and flowed, and research in AI continued to gather steam although "some computer scientists and software engineers would avoid the term artificial intelligence for fear of being viewed as wild-eyed dreamers" [43]. During 1980s and 90s, researchers realized that many AI solutions could be improved by using techniques from mathematics and economics such as game theory, stochastic modeling, classical numerical methods, operations research and optimization. Better mathematical descriptions were developed for deep neural networks as well as evolutionary and genetic algorithms, which matured during this period.


This Week in AI, February 15th, 2018 โ€“ Udacity Inc โ€“ Medium

#artificialintelligence

Alex Irpan, a software engineer at Google, wrote an excellent article on the current difficulties of getting deep reinforcement learning to work. For example, even after weeks of optimizing hyperparameters and explotation-exploration rates, these models are still highly sensitive to initial conditions. A 30% failure rate is seen as "working." Irpan makes the argument that most attempts with deep RL fail but no one talks about it publicly, we only see the few cases where the problems are simplified enough to be feasible. This is still a new field - the breakthrough Atari DQN paper was published only 3 years ago - so there is plenty of room for more research and advancement.


[P]I wrote a tutorial about Inverse Reinforcement Learning and three basic algorithms. More to follow. โ€ข r/MachineLearning

@machinelearnbot

This idea is really interesting. Sadly I don't have nearly enough linear algebra experience to understand the details though. Would IRL still be feasible if the state was not explicit? It seems like this technique depends on prior knowledge of the state machine, but from what I understand about deep reinforcement learning, the state may be very complex, and the value function could actually be a deep neural network.


Reactive Reinforcement Learning in Asynchronous Environments

arXiv.org Artificial Intelligence

The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored. Frequently used models of the interaction between an agent and its environment, such as Markov Decision Processes (MDP) or Semi-Markov Decision Processes (SMDP), do not capture the fact that, in an asynchronous environment, the state of the environment may change during computation performed by the agent. In an asynchronous environment, minimizing reaction time---the time it takes for an agent to react to an observation---also minimizes the time in which the state of the environment may change following observation. In many environments, the reaction time of an agent directly impacts task performance by permitting the environment to transition into either an undesirable terminal state or a state where performing the chosen action is inappropriate. We propose a class of reactive reinforcement learning algorithms that address this problem of asynchronous environments by immediately acting after observing new state information. We compare a reactive SARSA learning algorithm with the conventional SARSA learning algorithm on two asynchronous robotic tasks (emergency stopping and impact prevention), and show that the reactive RL algorithm reduces the reaction time of the agent by approximately the duration of the algorithm's learning update. This new class of reactive algorithms may facilitate safer control and faster decision making without any change to standard learning guarantees.


Online Machine Learning in Big Data Streams

arXiv.org Machine Learning

The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software architectures and efficient algorithms. The second one also imposes nontrivial theoretical restrictions on the modeling methods: In the data stream model, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives. In this article, we provide an overview of distributed software architectures and libraries as well as machine learning models for online learning. We highlight the most important ideas for classification, regression, recommendation, and unsupervised modeling from streaming data, and we show how they are implemented in various distributed data stream processing systems. This article is a reference material and not a survey. We do not attempt to be comprehensive in describing all existing methods and solutions; rather, we give pointers to the most important resources in the field. All related sub-fields, online algorithms, online learning, and distributed data processing are hugely dominant in current research and development with conceptually new research results and software components emerging at the time of writing. In this article, we refer to several survey results, both for distributed data processing and for online machine learning. Compared to past surveys, our article is different because we discuss recommender systems in extended detail.


Variance-Reduced Stochastic Learning under Random Reshuffling

arXiv.org Machine Learning

Several useful variance-reduced stochastic gradient algorithms, such as SVRG, SAGA, Finito, and SAG, have been proposed to minimize empirical risks with linear convergence properties to the exact minimizer. The existing convergence results assume uniform data sampling with replacement. However, it has been observed in related works that random reshuffling can deliver superior performance over uniform sampling and, yet, no formal proofs or guarantees of exact convergence exist for variance-reduced algorithms under random reshuffling. This paper makes two contributions. First, it resolves this open issue and provides the first theoretical guarantee of linear convergence under random reshuffling for SAGA; the argument is also adaptable to other variance-reduced algorithms. Second, under random reshuffling, the paper proposes a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements compared to SAGA and with balanced gradient computations compared to SVRG. AVRG is also shown analytically to converge linearly.