Supplementary Material Learning to Play Sequential Games versus Unknown Opponents Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause (NeurIPS 2020)

Oct-3-2025, 02:45:58 GMT–Neural Information Processing Systems

Our goal is to bound the learner's cumulative regret's are the actions chosen by the learner and In case we have k (,) L for some L> 0 then the result holds for L . 's according to the standard MW update algorithm which's, we follow the same proof steps as in proof of Theorem 1 to show that, with probability at least 1, the learner's regret can be bounded as R ( T) The corollary's statement then follows by observing that As discussed in Section 3.3, in a repeated Stackelberg game the decision Before bounding the leader' regret, recall that the algorithm resulting from Corollary 3 consists of In this section, we describe the experimental setup of Section 4.1. D ( y), (18) 16 Figure 3: Obtained rewards when the rangers know the poachers' model (OPT), use the proposed algorithm to update their patrol strategy online ( SU ( x, y) to maximize their own utility function. For the poachers' utility we use GP-UCB either converges to suboptimal solutions or displays a slower learning curve. In the case of more than one best response, ties are broken in an arbitrary but consistent manner.

artificial intelligence, machine learning, operator, (14 more...)

Neural Information Processing Systems

Oct-3-2025, 02:45:58 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology
  - Game Theory (1.00)
  - Artificial Intelligence
    - Machine Learning (0.67)
    - Representation & Reasoning (0.46)

Duplicate Docs Excel Report

Title
65cf25ef90de99d93fa96dc49d0d8b3c-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found