Agents
Supplementary Material Contextual Games: Multi-Agent Learning with Side Information Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, Maryam Kamgarpour (NeurIPS 2020)
The theoretical guarantees obtained in Section 3 rely on the following two main lemmas. 's be the distributions computed using the MW rule: p Similarly to Appendix A.1, we let As it was done in proof of Theorems 1 and Appendix A.1, Also, we now explicitly consider the adaptiveness of the adversary. The second equality follows by the law of total expectation. Consider a contextual game and assume contexts are sampled i.i.d. Hoeffding's inequality [21] shows that for any > 0 P E In this section we describe the experimental setup of the contextual traffic routing game of Section 5.
Appendix A Lower Bound In this section, we establish a lower bound on the expected regret of any algorithm for our multi-agent
Our goal in this section is twofold. To avoid excessive repetition of notation and proof arguments, we purposefully leave this section not self-contained and only outline these adjustments needed. We refer an interested reader to the work of Auer et al. We leave it to future work to optimize the dependence on N and K . In our MA-MAB problem, an algorithm is allowed to "pull" a distribution over the arms In the proof of their Lemma A.1, in the explanation of their Equation (30), they cite the assumption that given the rewards observed in the first Finally, in the proof of their Theorem A.2, they again consider the probability Thus, we have the following lower bound.
Appendices for No-regret Learning in Price Competitions under Consumer Reference Effects A Expanded Literature Review
There are also very recent works that address the dynamic pricing problem with consumer reference effects under uncertain demand. Nevertheless, these two lines of works are oblivious to consumer reference effects. In contrast to these two papers, our work studies price competitions over an infinite time horizon where reference prices adjust over time, and provides theoretical guarantees for the convergence of pricing strategies under the partial information setting. In their setting, the subgradient for each bidder's objective is a function of all bidders' decisions as well as its budget rate (i.e. total fixed budget divided by a given time horizon), which can be B.1 Proof of Theorem 3.1 (i) By first order conditions, we know that arg max We now follow a similar proof to that of Tarski's fixed point theorem: consider the set Note that convergence is monotonic because U () is nondecreasing. This implies that under Assumption 1, the interior SNE is unique.