AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning

Zhang, Ruiyi, Yu, Tong, Shen, Yilin, Jin, Hongxia, Chen, Changyou

Neural Information Processing SystemsMar-19-2020, 03:01:54 GMT

Text-based interactive recommendation provides richer user preferences and has demonstrated advantages over traditional interactive recommender systems. However, recommendations can easily violate preferences of users from their past natural-language feedback, since the recommender needs to explore new items for further improvement. To alleviate this issue, we propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time. Specifically, we leverage a discriminator to detect recommendations violating user historical preference, which is incorporated into the standard RL objective of maximizing expected cumulative future rewards. Our proposed framework is general and is further extended to the task of constrained text generation.

constraint-augmented reinforcement learning, text-based interactive recommendation, user preference

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Ainsworth, Samuel, Barnes, Matt, Srinivasa, Siddhartha

Neural Information Processing SystemsMar-19-2020, 03:01:43 GMT

In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods. Papers published at the Neural Information Processing Systems Conference.

emergency stop mechanism, state mo

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Planning with Goal-Conditioned Policies

Nasiriany, Soroush, Pong, Vitchyr, Lin, Steven, Levine, Sergey

Neural Information Processing SystemsMar-19-2020, 02:47:03 GMT

Planning methods can solve temporally extended sequential decision making problems by composing simple behaviors. However, planning requires suitable abstractions for the states and transitions, which typically need to be designed by hand. In contrast, reinforcement learning (RL) can acquire behaviors from low-level inputs directly, but struggles with temporally extended tasks. Can we utilize reinforcement learning to automatically form the abstractions needed for planning, thus obtaining the best of both approaches? We show that goal-conditioned policies learned with RL can be incorporated into planning, such that a planner can focus on which states to reach, rather than how those states are reached.

abstraction, goal-conditioned policy, latent variable model, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Learning Compositional Neural Programs with Recursive Tree Search and Planning

PIERROT, Thomas, Ligner, Guillaume, Reed, Scott E., Sigaud, Olivier, Perrin, Nicolas, Laterre, Alexandre, Kas, David, Beguir, Karim, Freitas, Nando de

Neural Information Processing SystemsMar-19-2020, 02:46:10 GMT

We propose a novel reinforcement learning algorithm, AlphaNPI, that incorpo- rates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero. NPI contributes structural biases in the form of modularity, hierarchy and recursion, which are helpful to reduce sample complexity, improve generalization and in- crease interpretability. AlphaZero contributes powerful neural network guided search algorithms, which we augment with recursion. AlphaNPI only assumes a hierarchical program specification with sparse rewards: 1 when the program execution satisfies the specification, and 0 otherwise. This specification enables us to overcome the need for strong supervision in the form of execution traces and consequently train NPI models effectively with reinforcement learning.

learning compositional neural program, recursive tree search and planning, specification, (4 more...)

Neural Information Processing Systems

Country: Asia > Vietnam > Hanoi > Hanoi (0.09)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.80)

Add feedback

A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

Yang, Runzhe, Sun, Xingyuan, Narasimhan, Karthik

Neural Information Processing SystemsMar-19-2020, 02:46:00 GMT

We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After an initial learning phase, our agent can execute the optimal policy under any given preference, or automatically infer an underlying preference with very few samples.

generalized algorithm, multi-objective reinforcement learning, reinforcement learning and policy adaptation, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Worst-Case Regret Bounds for Exploration via Randomized Value Functions

Russo, Daniel

Neural Information Processing SystemsMar-19-2020, 02:32:47 GMT

This paper studies a recent proposal to use randomized value functions to drive exploration in reinforcement learning. These randomized value functions are generated by injecting random noise into the training data, making the approach compatible with many popular methods for estimating parameterized value functions. By providing a worst-case regret bound for tabular finite-horizon Markov decision processes, we show that planning with respect to these randomized value functions can induce provably efficient exploration. Papers published at the Neural Information Processing Systems Conference.

exploration, randomized value function, worst-case regret bound

Neural Information Processing Systems

Genre: Research Report (0.75)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Add feedback

When to use parametric models in reinforcement learning?

Hasselt, Hado P. van, Hessel, Matteo, Aslanides, John

Neural Information Processing SystemsMar-19-2020, 02:32:18 GMT

We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and behaviour. We discuss when to expect benefits from either approach, and interpret prior work in this context. We hypothesise that, under suitable conditions, replay-based algorithms should be competitive to or better than model-based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free.

machine learning, parametric model, reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Correlation Priors for Reinforcement Learning

Alt, Bastian, Šošić, Adrian, Koeppl, Heinz

Neural Information Processing SystemsMar-19-2020, 02:31:25 GMT

Many decision-making problems naturally exhibit pronounced structures inherited from the characteristics of the underlying environment. In a Markov decision process model, for example, two distinct states can have inherently related semantics or encode resembling physical state configurations. This often implies locally correlated transition dynamics among the states. In order to complete a certain task in such environments, the operating agent usually needs to execute a series of temporally and spatially correlated actions. Though there exists a variety of approaches to capture these correlations in continuous state-action domains, a principled solution for discrete environments is missing.

reinforcement learning

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.45)

Add feedback

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Seijen, Harm Van, Fatemi, Mehdi, Tavakoli, Arash

Neural Information Processing SystemsMar-19-2020, 02:31:18 GMT

In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods.

enable lower discount factor, logarithmic mapping, reinforcement learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning with Convex Constraints

Miryoosefi, Sobhan, Brantley, Kianté, III, Hal Daume, Dudik, Miro, Schapire, Robert E.

Neural Information Processing SystemsMar-19-2020, 02:30:57 GMT

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks: specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set. This captures previously studied constraints (such as safety and proximity to an expert), but also enables new classes of constraints (such as diversity).

constraint, convex constraint, reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback