statements and
–Neural Information Processing Systems
Let a two-player Markov game where both players affect the transition. We will effectively show that the problem of best-responding to a correlated policy σ is526 equivalent to best-responding to the marginal policy of σ for the opponent. The proof follows from527 the equivalence of the two MDPs.528 Before that, given a (possibly correlated) joint policy σ we define a nonlinear program, (PBR), whose539 optimal solutions are best-response policies of each agent k to σ k and the values for each state s540 and timestep h:541 A.2 Proof of Theorem 3.2542 The best-response program. First, we state the following lemma that will prove useful for several543 of our arguments,544 Lemma A.1 (Best-response LP).
Neural Information Processing Systems
Apr-29-2026, 14:25:40 GMT
- Technology: