Contextual Multinomial Logit Bandits with General Value Functions

Feb-18-2024–arXiv.org Artificial Intelligence

Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. When applied to the linear case, our results not only are the first ones with no dependence on a certain problem-dependent constant that can be exponentially large, but also enjoy other advantages such as computational efficiency, dimension-free regret bounds, or the ability to handle completely adversarial contexts and rewards.

artificial intelligence, contextual bandit, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Feb-18-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.14)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Education > Educational Setting (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)