Pareto Optimal Risk Measure Agnostic Distributional Bandits with Heavy-Tail Rewards

Jun-19-2026, 14:32:19 GMT–Neural Information Processing Systems

This paper addresses the problem of multi-risk measure agnostic multi-armed bandits in heavy-tailed reward settings. We propose a framework that leverages novel deviation inequalities for the 1-Wasserstein distance to construct confidence intervals for Lipschitz risk measures. The distributional LCB (DistLCB) algorithm is introduced, which achieves asymptotic optimality by deriving the first lower bounds for risk measure aware bandits with explicit sub-optimality gap dependencies. The DistLCB is further extended to multi-risk objectives, which enables Pareto-optimal solutions that consider multiple aspects of reward distributions. Additionally, we provide a regret analysis that includes both gap-dependent and gap-independent bounds for multi-risk settings. Experiments validate the effectiveness of the proposed methods in synthetic and real-world applications.

artificial intelligence, data mining, machine learning, (23 more...)

Neural Information Processing Systems

Jun-19-2026, 14:32:19 GMT

Conferences PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.93)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.92)

Industry:
- Banking & Finance > Trading (0.67)

Technology:
- Information Technology
  - Game Theory (0.85)
  - Data Science > Data Mining
    - Big Data (0.68)
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning > Optimization (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found