Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games
Sougata Chaudhuri, Ambuj Tewari
–Neural Information Processing Systems
Partial monitoring games are repeated games where the learner receives feedback that might be different from adversary's move or even the reward gained by the learner. Recently, a general model of combinatorial partial monitoring (CPM) games was proposed [1], where the learner's action space can be exponentially large and adversary samples its moves from a bounded, continuous space, according to a fixed distribution.
Neural Information Processing Systems
Jun-2-2025, 05:59:29 GMT