State Abstraction in MAXQ Hierarchical Reinforcement Learning

Dietterich, Thomas G.

Neural Information Processing Systems 

Forexample, in the Options framework [1,2], the programmer defines a set of macro actions ("options") and provides a policy for each. Learning algorithms (such as semi-Markov Q learning) can then treat these temporally abstract actions as if they were primitives and learn a policy for selecting among them. Closely related is the HAM framework, in which the programmer constructs a hierarchy of finitestate controllers[3]. Each controller can include non-deterministic states (where the programmer was not sure what action to perform). The HAMQ learning algorithm can then be applied to learn a policy for making choices in the non-deterministic states.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found