Learning from Humans as an I-POMDP
Woodward, Mark P., Wood, Robert J.
–arXiv.org Artificial Intelligence
The interactive partially observable Markov decision process (I-POMDP) is a recently developed framework which extends the POMDP to the multi-agent setting by including agent models in the state space. This paper argues for formulating the problem of an agent learning interactively from a human teacher as an I-POMDP, where the agent \emph{programming} to be learned is captured by random variables in the agent's state space, all \emph{signals} from the human teacher are treated as observed random variables, and the human teacher, modeled as a distinct agent, is explicitly represented in the agent's state space. The main benefits of this approach are: i. a principled action selection mechanism, ii. a principled belief update mechanism, iii. support for the most common teacher \emph{signals}, and iv. the anticipated production of complex beneficial interactions. The proposed formulation, its benefits, and several open questions are presented.
arXiv.org Artificial Intelligence
Apr-1-2012
- Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Genre:
- Research Report (0.84)
- Industry:
- Education (0.92)