Active Learning of Hierarchical Policies from State-Action Trajectories
Hamidi, Mandana (Oregon State University) | Tadepalli, Prasad (School of Electrical Engineering and Computer Science) | Goetschalckx, Robby (Oregon State University) | Fern, Alan (Oregon State University)
While most work on trajectory mining is applied to pre- dict movements of mobile users, in this paper we consider a more general problem of building behavior models of users from their state-action trajectories. We assume that the user behavior can be compactly modeled as a Probabilistic State-Dependent Grammar (PSDG) which represents a hierarchical policy. The key problem is that while the states and actions of the user are directly observed, his intentional structure is not. We propose to learn the user’s policy from a set of selected trajectories and intention queries at selected states in the trajectory. Our main contributions are an algorithm for learning hierarchical policies from state-action trajectories, and principled heuristics for selecting suitable trajectories and intention queries. Experiments in multiple domains show that our approach is effective and more sample-efficient than learning non-hierarchical policies.
Mar-1-2015