Universal Option Models

Mar-13-2024, 11:03:21 GMT–Neural Information Processing Systems

We consider the problem of learning models of options for real-time abstract planning, in the setting where reward functions can be specified at any time and their expected returns must be efficiently computed. We introduce a new model for an option that is independent of any reward function, called the universal option model (UOM). We prove that the UOM of an option can construct a traditional option model given a reward function, and also supports efficient computation of the option-conditional return. We extend the UOM to linear function approximation, and we show the UOM gives the TD solution of option returns and the value function of a policy over options. We provide a stochastic approximation algorithm for incrementally learning UOMs from data and prove its consistency. We demonstrate our method in two domains. The first domain is a real-time strategy game, where the controller must select the best game unit to accomplish a dynamically-specified task. The second domain is article recommendation, where each user query defines a new reward function and an article's relevance is the expected return from following a policy that follows the citations between articles. Our experiments show that UOMs are substantially more efficient than previously known methods for evaluating option returns and policies over options.

reward function, uom, value function, (16 more...)

Neural Information Processing Systems

Mar-13-2024, 11:03:21 GMT

Conferences PDF

Add feedback

Country:
- North America
  - United States > Massachusetts
    - Hampshire County > Amherst (0.04)
  - Canada > Alberta
    - Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - India > Karnataka
    - Bengaluru (0.04)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (0.69)