Reviews: Learning Others' Intentional Models in Multi-Agent Settings Using Interactive POMDPs

Neural Information Processing Systems 

The paper describes a sampling method for learning agent behaviors in interactive POMDPs (I-POMDPs). In general, I-POMDPs are a multi-agent POMDP model which, in addition to a belief about the environment state, the belief space includes nested recursive beliefs about the other agents' models. I-POMDP solutions, including the one proposed in the paper, largely approximate using a finite depth with either intentional models of others (e.g., their nested beliefs, state transitions, optimality criterion, etc.) or subintentional models of others (e.g., essentially "summaries of behavior" such as fictitious play). The proposed approach uses samples of the other agent at a particular depth to compute its values and policy. Related work on an interactive particle filter assumed the full frame was known (b, S, A, Omega, T, R, OC).