Choice Bandits Supplementary Material A Organization

Neural Information Processing Systems 

We provide additional discussion about the related work in Appendix B. We provide the proof of our regret lower bound (Theorem 1) in Appendix C. We prove a concentration inequality for pairwise estimates in Appendix D. We then provide the proof of our regret upper bound (Theorem 2) in Appendix E. In Appendix F we provide additional details about our experimental setup. In Appendix G we provide experimental results for an alternate notion of regret. Appendix H contains some technical lemmas used in the proof of the upper bound result in Theorem 2. There has been some recent interest in bandit settings where more than two arms are played at once (although no previous work considers choice models at the level of generality we do). We review related work here and provide a summary in Table 1. Moreover, we study a much more general class of choice models than the MNL model studied by them.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found