assortment
Dynamic pricing and assortment under a contextual MNL demand
We consider dynamic multi-product pricing and assortment problems under an unknown demand over T periods, where in each period, the seller decides on the price for each product or the assortment of products to offer to a customer who chooses according to an unknown Multinomial Logit Model (MNL). Such problems arise in many applications, including online retail and advertising. We propose a randomized dynamic pricing policy based on a variant of the Online Newton Step algorithm (ONS) that achieves a O(d T log(T))regret guarantee under an adversarial arrival model. We also present a new optimistic algorithm for the adversarial MNL contextual bandits problem, which achieves a better dependency than the state-of-the-art algorithms in a problem-dependent constant ฮบ2 (potentially exponentially small). Our regret upper bound scales as O(d ฮบ2T +log(T)/ฮบ2), which gives a stronger bound than the existing O(d T/ฮบ2)guarantees.
Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models
Yining Wang, Xi Chen, Yuan Zhou
In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of Op? T log log T q, which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.
Tractable Multinomial Logit Contextual Bandits with Non-Linear Utilities
Hwang, Taehyun, Kim, Dahngoon, Oh, Min-hwan
We study the multinomial logit (MNL) contextual bandit problem for sequential assortment selection. Although most existing research assumes utility functions to be linear in item features, this linearity assumption restricts the modeling of intricate interactions between items and user preferences. A recent work (Zhang & Luo, 2024) has investigated general utility function classes, yet its method faces fundamental trade-offs between computational tractability and statistical efficiency. To address this limitation, we propose a computationally efficient algorithm for MNL contextual bandits leveraging the upper confidence bound principle, specifically designed for non-linear parametric utility functions, including those modeled by neural networks. Under a realizability assumption and a mild geometric condition on the utility function class, our algorithm achieves a regret bound of $\tilde{O}(\sqrt{T})$, where $T$ denotes the total number of rounds. Our result establishes that sharp $\tilde{O}(\sqrt{T})$-regret is attainable even with neural network-based utilities, without relying on strong assumptions such as neural tangent kernel approximations. To the best of our knowledge, our proposed method is the first computationally tractable algorithm for MNL contextual bandits with non-linear utilities that provably attains $\tilde{O}(\sqrt{T})$ regret. Comprehensive numerical experiments validate the effectiveness of our approach, showing robust performance not only in realizable settings but also in scenarios with model misspecification.
From Small to Large: A Graph Convolutional Network Approach for Solving Assortment Optimization Problems
Li, Guokai, Gao, Pin, Jasin, Stefanus, Wang, Zizhuo
Assortment optimization seeks to select a subset of substitutable products, subject to constraints, to maximize expected revenue. The problem is NP-hard due to its combinatorial and nonlinear nature and arises frequently in industries such as e-commerce, where platforms must solve thousands of such problems each minute. We propose a graph convolutional network (GCN) framework to efficiently solve constrained assortment optimization problems. Our approach constructs a graph representation of the problem, trains a GCN to learn the mapping from problem parameters to optimal assortments, and develops three inference policies based on the GCN's output. Owing to the GCN's ability to generalize across instance sizes, patterns learned from small-scale samples can be transferred to large-scale problems. Numerical experiments show that a GCN trained on instances with 20 products achieves over 85% of the optimal revenue on problems with up to 2,000 products within seconds, outperforming existing heuristics in both accuracy and efficiency. We further extend the framework to settings with an unknown choice model using transaction data and demonstrate similar performance and scalability.