AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning

Liang, Qingkai, Que, Fanyu, Modiano, Eytan

arXiv.org Machine LearningFeb-18-2018

Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term cost. A canonical approach for solving CMDPs is the primal-dual method which updates parameters in primal and dual spaces in turn. Existing methods for CMDPs only use on-policy data for dual updates, which results in sample inefficiency and slow convergence. In this paper, we propose a policy search method for CMDPs called Accelerated Primal-Dual Optimization (APDO), which incorporates an off-policy trained dual variable in the dual update procedure while updating the policy in primal space with on-policy likelihood ratio gradient. Experimental results on a simulated robot locomotion task show that APDO achieves better sample efficiency and faster convergence than state-of-the-art approaches for CMDPs.

constraint, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1802.0648

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Improving Mild Cognitive Impairment Prediction via Reinforcement Learning and Dialogue Simulation

Tang, Fengyi, Lin, Kaixiang, Uchendu, Ikechukwu, Dodge, Hiroko H., Zhou, Jiayu

arXiv.org Machine LearningFeb-18-2018

Mild cognitive impairment (MCI) is a prodromal phase in the progression from normal aging to dementia, especially Alzheimers disease. Even though there is mild cognitive decline in MCI patients, they have normal overall cognition and thus is challenging to distinguish from normal aging. Using transcribed data obtained from recorded conversational interactions between participants and trained interviewers, and applying supervised learning models to these data, a recent clinical trial has shown a promising result in differentiating MCI from normal aging. However, the substantial amount of interactions with medical staff can still incur significant medical care expenses in practice. In this paper, we propose a novel reinforcement learning (RL) framework to train an efficient dialogue agent on existing transcripts from clinical trials. Specifically, the agent is trained to sketch disease-specific lexical probability distribution, and thus to converse in a way that maximizes the diagnosis accuracy and minimizes the number of conversation turns. We evaluate the performance of the proposed reinforcement learning framework on the MCI diagnosis from a real clinical trial. The results show that while using only a few turns of conversation, our framework can significantly outperform state-of-the-art supervised learning approaches.

machine learning, reinforcement learning, rl-agent, (15 more...)

arXiv.org Machine Learning

1802.06428

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Personal > Interview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Sim-To-Real Optimization Of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

Tan, Yongxi, Yang, Jin, Chen, Xin, Song, Qitao, Chen, Yunjun, Ye, Zhangxiang, Su, Zhenqiang

arXiv.org Machine LearningFeb-18-2018

Mobile network that millions of people use every day is one of the most complex systems in real world. Optimization of mobile network to meet exploding customer demand and reduce CAPEX/OPEX poses greater challenges than in prior works. Learning to solve complex problems in real world to benefit everyone and make the world better has long been ultimate goal of AI. However, it still remains an unsolved problem for deep reinforcement learning (DRL), given imperfect information in real world, huge state/action space, lots of data needed for training, associated time/cost, multi-agent interactions, potential negative impact to real world, etc. To bridge this reality gap, we proposed a DRL framework to direct transfer optimal policy learned from multi-tasks in source domain to unseen similar tasks in target domain without any further training in both domains. First, we distilled temporal-spatial relationships between cells and mobile users to scalable 3D image-like tensor to best characterize partially observed mobile network. Second, inspired by AlphaGo, we used a novel self-play mechanism to empower DRL agent to gradually improve its intelligence by competing for best record on multiple tasks. Third, a decentralized DRL method is proposed to coordinate multi-agents to compete and cooperate as a team to maximize global reward and minimize potential negative impact. Using 7693 unseen test tasks over 160 unseen simulated mobile networks and 6 field trials over 4 commercial mobile networks in real world, we demonstrated the capability of our approach to direct transfer the learning from one simulator to another simulator, and from simulation to real world. This is the first time that a DRL agent successfully transfers its learning directly from simulation to very complex real world problems with incomplete and imperfect information, huge state/action space and multi-agent interactions.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1802.06416

Genre: Research Report (0.54)

Industry:

Telecommunications (0.91)
Information Technology (0.88)
Leisure & Entertainment > Games > Go (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Beginner's Guide to Deep Reinforcement Learning (for Java and Scala) - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM

#artificialintelligenceFeb-17-2018, 20:54:33 GMT

While neural networks are responsible for recent breakthroughs in problems like computer vision, machine translation and time series prediction – they can also combine with reinforcement learning algorithms to create something astounding like AlphaGo. Reinforcement learning refers to goal-oriented algorithms, which learn how to attain a complex objective (goal) or maximize along a particular dimension over many steps; for example, maximize the points won in a game over many moves. They can start from a blank slate, and under the right conditions they achieve superhuman performance. Like a child incentivized by spankings and candy, these algorithms are penalized when they make the wrong decisions and rewarded when they make the right ones – this is reinforcement. Reinforcement algorithms that incorporate deep learning can beat world champions at the game of Go as well as human experts playing numerous Atari video games.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Industry:

Leisure & Entertainment > Games > Go (0.55)
Leisure & Entertainment > Games > Computer Games (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Resurgence of AI During 1983-2010

#artificialintelligenceFeb-17-2018, 07:51:34 GMT

Every decade seems to have its technological buzzwords: we had personal computers in 1980s; Internet and worldwide web in 1990s; smart phones and social media in 2000s; and Artificial Intelligence (AI) and Machine Learning in this decade. The 1950-82 era saw a new field of Artificial Intelligence (AI) being born, lot of pioneering research being done, massive hype being created, and AI going into hibernation when this hype did not materialize, and the research funding dried up [56]. During 1983 and 2010, research funding ebbed and flowed, and research in AI continued to gather steam although "some computer scientists and software engineers would avoid the term artificial intelligence for fear of being viewed as wild-eyed dreamers" [43]. During 1980s and 90s, researchers realized that many AI solutions could be improved by using techniques from mathematics and economics such as game theory, stochastic modeling, classical numerical methods, operations research and optimization. Better mathematical descriptions were developed for deep neural networks as well as evolutionary and genetic algorithms, which matured during this period.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Industry:

Leisure & Entertainment > Games > Chess (1.00)
Information Technology (0.97)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

Add feedback

This Week in AI, February 15th, 2018 – Udacity Inc – Medium

#artificialintelligenceFeb-17-2018, 05:32:07 GMT

Alex Irpan, a software engineer at Google, wrote an excellent article on the current difficulties of getting deep reinforcement learning to work. For example, even after weeks of optimizing hyperparameters and explotation-exploration rates, these models are still highly sensitive to initial conditions. A 30% failure rate is seen as "working." Irpan makes the argument that most attempts with deep RL fail but no one talks about it publicly, we only see the few cases where the problems are simplified enough to be feasible. This is still a new field - the breakthrough Atari DQN paper was published only 3 years ago - so there is plenty of room for more research and advancement.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

#artificialintelligence

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)
Information Technology > Services (0.38)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.40)

Add feedback

[P]I wrote a tutorial about Inverse Reinforcement Learning and three basic algorithms. More to follow. • r/MachineLearning

@machinelearnbotFeb-17-2018, 01:17:39 GMT

This idea is really interesting. Sadly I don't have nearly enough linear algebra experience to understand the details though. Would IRL still be feasible if the state was not explicit? It seems like this technique depends on prior knowledge of the state machine, but from what I understand about deep reinforcement learning, the state may be very complex, and the value function could actually be a deep neural network.

artificial intelligence, inverse reinforcement learning, machine learning, (3 more...)

@machinelearnbot

Genre: Instructional Material > Course Syllabus & Notes (0.40)

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reactive Reinforcement Learning in Asynchronous Environments

Travnik, Jaden B., Mathewson, Kory W., Sutton, Richard S., Pilarski, Patrick M.

arXiv.org Artificial IntelligenceFeb-16-2018

The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored. Frequently used models of the interaction between an agent and its environment, such as Markov Decision Processes (MDP) or Semi-Markov Decision Processes (SMDP), do not capture the fact that, in an asynchronous environment, the state of the environment may change during computation performed by the agent. In an asynchronous environment, minimizing reaction time---the time it takes for an agent to react to an observation---also minimizes the time in which the state of the environment may change following observation. In many environments, the reaction time of an agent directly impacts task performance by permitting the environment to transition into either an undesirable terminal state or a state where performing the chosen action is inappropriate. We propose a class of reactive reinforcement learning algorithms that address this problem of asynchronous environments by immediately acting after observing new state information. We compare a reactive SARSA learning algorithm with the conventional SARSA learning algorithm on two asynchronous robotic tasks (emergency stopping and impact prevention), and show that the reactive RL algorithm reduces the reaction time of the agent by approximately the duration of the algorithm's learning update. This new class of reactive algorithms may facilitate safer control and faster decision making without any change to standard learning guarantees.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.3389/frobt.2018.00079

1802.06139

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Online Machine Learning in Big Data Streams

Benczúr, András A., Kocsis, Levente, Pálovics, Róbert

arXiv.org Machine LearningFeb-16-2018

The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software architectures and efficient algorithms. The second one also imposes nontrivial theoretical restrictions on the modeling methods: In the data stream model, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives. In this article, we provide an overview of distributed software architectures and libraries as well as machine learning models for online learning. We highlight the most important ideas for classification, regression, recommendation, and unsupervised modeling from streaming data, and we show how they are implemented in various distributed data stream processing systems. This article is a reference material and not a survey. We do not attempt to be comprehensive in describing all existing methods and solutions; rather, we give pointers to the most important resources in the field. All related sub-fields, online algorithms, online learning, and distributed data processing are hugely dominant in current research and development with conceptually new research results and software components emerging at the time of writing. In this article, we refer to several survey results, both for distributed data processing and for online machine learning. Compared to past surveys, our article is different because we discuss recommender systems in extended detail.

data mining, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1802.05872

Country: North America > United States (1.00)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Online (0.82)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(6 more...)

Add feedback

Variance-Reduced Stochastic Learning under Random Reshuffling

Ying, Bicheng, Yuan, Kun, Sayed, Ali H.

arXiv.org Machine LearningFeb-16-2018

Several useful variance-reduced stochastic gradient algorithms, such as SVRG, SAGA, Finito, and SAG, have been proposed to minimize empirical risks with linear convergence properties to the exact minimizer. The existing convergence results assume uniform data sampling with replacement. However, it has been observed in related works that random reshuffling can deliver superior performance over uniform sampling and, yet, no formal proofs or guarantees of exact convergence exist for variance-reduced algorithms under random reshuffling. This paper makes two contributions. First, it resolves this open issue and provides the first theoretical guarantee of linear convergence under random reshuffling for SAGA; the argument is also adaptable to other variance-reduced algorithms. Second, under random reshuffling, the paper proposes a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements compared to SAGA and with balanced gradient computations compared to SVRG. AVRG is also shown analytically to converge linearly.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1708.01383

Country:

Europe (0.67)
North America > Canada (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback