A Bellman Equations for Markov Games

Neural Information Processing Systems 

To state the regret guarantee, we also define ι = log( p/AK) for any p (0, 1] . Now we can upper bound the regret by 25 Lemma 17. F ollowing Algorithm 9, with probability 1 3p, for any θ

Similar Docs  Excel Report  more

TitleSimilaritySource
None found