Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

Wang, Yining, Chen, Xi, Zhou, Yuan

Neural Information Processing Systems 

In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found