Constructing an Optimal Behavior Basis for the Option Keyboard

Jun-10-2026, 03:05:55 GMT–Neural Information Processing Systems

Multi-task reinforcement learning aims to quickly identify solutions for new tasks with minimal or no additional interaction with the environment. Generalized Policy Improvement (GPI) addresses this by combining a set of base policies to produce a new one that is at least as good--though not necessarily optimal--as any individual base policy. Optimality can be ensured, particularly in the linear-reward case, via techniques that compute a Convex Coverage Set (CCS). However, these are computationally expensive and do not scale to complex domains. The Option Keyboard (OK) improves upon GPI by producing policies that are at least as good--and often better.

artificial intelligence, base policy, machine learning, (5 more...)

Neural Information Processing Systems

Jun-10-2026, 03:05:55 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.39)