Overview
Choice Bandits Supplementary Material A Organization
We provide additional discussion about the related work in Appendix B. We provide the proof of our regret lower bound (Theorem 1) in Appendix C. We prove a concentration inequality for pairwise estimates in Appendix D. We then provide the proof of our regret upper bound (Theorem 2) in Appendix E. In Appendix F we provide additional details about our experimental setup. In Appendix G we provide experimental results for an alternate notion of regret. Appendix H contains some technical lemmas used in the proof of the upper bound result in Theorem 2. There has been some recent interest in bandit settings where more than two arms are played at once (although no previous work considers choice models at the level of generality we do). We review related work here and provide a summary in Table 1. Moreover, we study a much more general class of choice models than the MNL model studied by them.