Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models
Wang, Yining, Chen, Xi, Zhou, Yuan
–Neural Information Processing Systems
In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.
Neural Information Processing Systems
Dec-31-2018