Thompson Sampling for Multi-Objective Linear Contextual Bandit

Jun-18-2026, 18:55:36 GMT–Neural Information Processing Systems

We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose MOL-TS, the first Thompson Sampling algorithm with Pareto regret guarantees for this problem. Unlike standard approaches that compute an empirical Pareto front each round, MOL-TS samples parameters across objectives and efficiently selects an arm from a novel effective Pareto front, which accounts for repeated selections over time. Our analysis shows that MOL-TSachieves a worst-case Pareto regret bound of eO(d3/2 T), where dis the dimension of the feature vectors, T is the total number of rounds, matching the best known order for randomized linear bandit algorithms for single objective. Empirical results confirm the benefits of our proposed approach, demonstrating improved regret minimization and strong multi-objective performance.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Jun-18-2026, 18:55:36 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.66)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.87)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning > Supervised Learning
      - Representation Of Examples (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found