Tractable Multinomial Logit Contextual Bandits with Non-Linear Utilities

Jun-15-2026, 00:52:20 GMT–Neural Information Processing Systems

We study the multinomial logit (MNL) contextual bandit problem for sequential assortment selection. Although most existing research assumes utility functions to be linear in item features, this linearity assumption restricts the modeling of intricate interactions between items and user preferences. A recent work [41] has investigated general utility function classes, yet its method faces fundamental tradeoffs between computational tractability and statistical efficiency. To address this limitation, we propose a computationally efficient algorithm for MNL contextual bandits leveraging the upper confidence bound principle, specifically designed for non-linear parametric utility functions, including those modeled by neural networks. Under a realizability assumption and a mild geometric condition on the utility function class, our algorithm achieves a regret bound of eO( T), where T denotes the total number of rounds. Our result establishes that sharp eO( T)-regret is attainable even with neural network-based utilities, without relying on strong assumptions such as neural tangent kernel approximations. To the best of our knowledge, our proposed method is the first computationally tractable algorithm for MNL contextual bandits with non-linear utilities that provably attains eO( T) regret.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Jun-15-2026, 00:52:20 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.87)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language (1.00)
    - Machine Learning > Neural Networks (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found