State Abstraction in MAXQ Hierarchical Reinforcement Learning
–Neural Information Processing Systems
Forexample, in the Options framework [1,2], the programmer defines a set of macro actions ("options") and provides a policy for each. Learning algorithms (such as semi-Markov Q learning) can then treat these temporally abstract actions as if they were primitives and learn a policy for selecting among them. Closely related is the HAM framework, in which the programmer constructs a hierarchy of finitestate controllers[3]. Each controller can include non-deterministic states (where the programmer was not sure what action to perform). The HAMQ learning algorithm can then be applied to learn a policy for making choices in the non-deterministic states.
Neural Information Processing Systems
Dec-31-2000
- Country:
- North America > United States
- California > San Francisco County
- San Francisco (0.14)
- Massachusetts (0.14)
- Oregon (0.14)
- California > San Francisco County
- North America > United States
- Industry:
- Transportation (0.31)
- Technology: