Goto

Collaborating Authors

 mnl model




Choice Bandits

Neural Information Processing Systems

There has been much interest in recent years in the problem of dueling bandits, where on each round the learner plays a pair of arms and receives as feedback the outcome of a relative pairwise comparison between them.



Pairwise Choice Markov Chains

Stephen Ragain, Johan Ugander

Neural Information Processing Systems

As datasets capturing human choices grow in richness and scale--particularly in online domains--there is an increasing need for choice models that escape traditional choice-theoretic axioms such as regularity, stochastic transitivity, and Luce's choice axiom. In this work we introduce the Pairwise Choice Markov Chain (PCMC) model of discrete choice, an inferentially tractable model that does not assume any of the above axioms while still satisfying the foundational axiom of uniform expansion, a considerably weaker assumption than Luce's choice axiom. We show that the PCMC model significantly outperforms both the Multinomial Logit (MNL) model and a mixed MNL (MMNL) model in prediction tasks on both synthetic and empirical datasets known to exhibit violations of Luce's axiom. Our analysis also synthesizes several recent observations connecting the Multinomial Logit model and Markov chains; the PCMC model retains the Multinomial Logit model as a special case.


Beyond Softmax: A New Perspective on Gradient Bandits

Melo, Emerson, Müller, David

arXiv.org Artificial Intelligence

We establish a link between a class of discrete choice models and the theory of online learning and multi-armed bandits. Our contributions are: (i) sublinear regret bounds for a broad algorithmic family, encompassing Exp3 as a special case; (ii) a new class of adversarial bandit algorithms derived from generalized nested logit models \citep{wen:2001}; and (iii) \textcolor{black}{we introduce a novel class of generalized gradient bandit algorithms that extends beyond the widely used softmax formulation. By relaxing the restrictive independence assumptions inherent in softmax, our framework accommodates correlated learning dynamics across actions, thereby broadening the applicability of gradient bandit methods.} Overall, the proposed algorithms combine flexible model specification with computational efficiency via closed-form sampling probabilities. Numerical experiments in stochastic bandit settings demonstrate their practical effectiveness.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

"NIPS Neural Information Processing Systems 8-11th December 2014, Montreal, Canada",,, "Paper ID:","406" "Title:","Learning Mixed Multinomial Logit Model from Ordinal Data" Current Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: This paper extends the classic MultiNomial Logit (MNL) choice model to a general family of choice models named Mixed MNL, which can be seen as a parametric class of distributions over permutations (e.g., permutations of items according to user preference). The main contributions of the paper are (1) to identify sufficient conditions under which a mixed MNL can be learnt, and (2) to propose a two-phase algorithm to learn the proposed mixed MNL models in an efficient manner. Part of the interesting theoretical results shows that the model with r components can be learnt with sample size being polynomially in n (number of items of interest) and r (number of components). Quality: The problem choice modeling studied in this paper is a fundamental and critical problem to the social choice community, and the proposed model and algorithm for this problem are certainly of interest to the machine learning community.


PASTA: A Unified Framework for Offline Assortment Learning

Dong, Juncheng, Mo, Weibin, Qi, Zhengling, Shi, Cong, Fang, Ethan X., Tarokh, Vahid

arXiv.org Artificial Intelligence

We study a broad class of assortment optimization problems in an offline and data-driven setting. In such problems, a firm lacks prior knowledge of the underlying choice model, and aims to determine an optimal assortment based on historical customer choice data. The combinatorial nature of assortment optimization often results in insufficient data coverage, posing a significant challenge in designing provably effective solutions. To address this, we introduce a novel Pessimistic Assortment Optimization (PASTA) framework that leverages the principle of pessimism to achieve optimal expected revenue under general choice models. Notably, PASTA requires only that the offline data distribution contains an optimal assortment, rather than providing the full coverage of all feasible assortments. Theoretically, we establish the first finite-sample regret bounds for offline assortment optimization across several widely used choice models, including the multinomial logit and nested logit models. Additionally, we derive a minimax regret lower bound, proving that PASTA is minimax optimal in terms of sample and model complexity. Numerical experiments further demonstrate that our method outperforms existing baseline approaches.