mnl model
Optimal Design for Multinomial Logit Model with Applications to Best Assortment Identification
We study optimal experimental design for multinomial logit (MNL) bandits, where an agent repeatedly selects a subset of $K$ items from a ground set of size $N$ and observes single-choice feedback. Unlike linear or generalized linear bandits, MNL bandits have a combinatorial action space, which makes classical optimal design approaches and naive optimization over all subsets computationally intractable. We propose a computationally efficient optimal design framework for MNL models that achieves both statistical efficiency and scalability through two complementary approaches: (i) an exact or certified-approximate reformulation of the design oracle as a $0$-$1$ mixed-integer linear program (MILP) with solver-certified early stopping, and (ii) a fully polynomial-time lifted design that replaces the nonlinear objective with a tractable surrogate. Using the Kiefer-Wolfowitz equivalence theorem, we establish near G-optimality guarantees and characterize the induced statistical-computational trade-offs. As an application, we develop a best assortment identification algorithm for MNL bandits with linear utilities and non-uniform revenues, and prove an instance-dependent sample complexity of $\tilde{O}\big(\frac{d \log N}{Δ^2}\big)$, where $d$ is the feature dimension, $N$ is the number of arms, and $Δ$ is the minimum revenue gap.
UCB-based Algorithms for Multinomial Logistic Regression Bandits
Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logistic bandits that are used in problems with binary rewards: for instance, when the learner aims to maximize the profit over a user that can select one of two possible outcomes (e.g., 'click' vs'no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing works do not address practical situations where the number of outcomes that can be selected by the user is larger than two (e.g., 'click', 'show me later', 'never show again', 'no click'). In this paper, we study such an extension. We use multinomial logit (MNL) to model the probability of each one of K+1 2possible outcomes (+1 stands for the'not click' outcome): we assume that for a learner's action xt, the user selects one of K +1 2outcomes, say outcome i, with a MNL probabilistic model with corresponding unknown parameter θ i. Each outcome i is also associated with a revenue parameter ρi and the goal is to maximize the expected revenue. For this problem, we present MNL-UCB, an upper confidence bound (UCB)-based algorithm, that achieves regret O(dK T) with small dependency on problemdependent constants that can otherwise be arbitrarily large and lead to loose regret bounds. We present numerical simulations that corroborate our theoretical results.
Beyond Softmax: A New Perspective on Gradient Bandits
We establish a link between a class of discrete choice models and the theory of online learning and multi-armed bandits. Our contributions are: (i) sublinear regret bounds for a broad algorithmic family, encompassing Exp3 as a special case; (ii) a new class of adversarial bandit algorithms derived from generalized nested logit models \citep{wen:2001}; and (iii) \textcolor{black}{we introduce a novel class of generalized gradient bandit algorithms that extends beyond the widely used softmax formulation. By relaxing the restrictive independence assumptions inherent in softmax, our framework accommodates correlated learning dynamics across actions, thereby broadening the applicability of gradient bandit methods.} Overall, the proposed algorithms combine flexible model specification with computational efficiency via closed-form sampling probabilities. Numerical experiments in stochastic bandit settings demonstrate their practical effectiveness.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
"NIPS Neural Information Processing Systems 8-11th December 2014, Montreal, Canada",,, "Paper ID:","406" "Title:","Learning Mixed Multinomial Logit Model from Ordinal Data" Current Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: This paper extends the classic MultiNomial Logit (MNL) choice model to a general family of choice models named Mixed MNL, which can be seen as a parametric class of distributions over permutations (e.g., permutations of items according to user preference). The main contributions of the paper are (1) to identify sufficient conditions under which a mixed MNL can be learnt, and (2) to propose a two-phase algorithm to learn the proposed mixed MNL models in an efficient manner. Part of the interesting theoretical results shows that the model with r components can be learnt with sample size being polynomially in n (number of items of interest) and r (number of components). Quality: The problem choice modeling studied in this paper is a fundamental and critical problem to the social choice community, and the proposed model and algorithm for this problem are certainly of interest to the machine learning community.
PASTA: A Unified Framework for Offline Assortment Learning
Dong, Juncheng, Mo, Weibin, Qi, Zhengling, Shi, Cong, Fang, Ethan X., Tarokh, Vahid
We study a broad class of assortment optimization problems in an offline and data-driven setting. In such problems, a firm lacks prior knowledge of the underlying choice model, and aims to determine an optimal assortment based on historical customer choice data. The combinatorial nature of assortment optimization often results in insufficient data coverage, posing a significant challenge in designing provably effective solutions. To address this, we introduce a novel Pessimistic Assortment Optimization (PASTA) framework that leverages the principle of pessimism to achieve optimal expected revenue under general choice models. Notably, PASTA requires only that the offline data distribution contains an optimal assortment, rather than providing the full coverage of all feasible assortments. Theoretically, we establish the first finite-sample regret bounds for offline assortment optimization across several widely used choice models, including the multinomial logit and nested logit models. Additionally, we derive a minimax regret lower bound, proving that PASTA is minimax optimal in terms of sample and model complexity. Numerical experiments further demonstrate that our method outperforms existing baseline approaches.