Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games

Sougata Chaudhuri, Ambuj Tewari

Neural Information Processing Systems 

Partial monitoring games are repeated games where the learner receives feedback that might be different from adversary's move or even the reward gained by the learner. Recently, a general model of combinatorial partial monitoring (CPM) games was proposed [1], where the learner's action space can be exponentially large and adversary samples its moves from a bounded, continuous space, according to a fixed distribution.