Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

Neural Information Processing Systems 

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.