ContinuousMean-CovarianceBandits

Feb-7-2026, 08:55:12 GMT–Neural Information Processing Systems

Specifically,inCMCB, there isalearner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. The agent's objective is to achieve the best trade-off between reward and risk, measured with option covariance. To capture different reward observation scenarios in practice, we considerthreefeedbacksettings,i.e.,full-information,semi-banditandfull-bandit feedback. Wepropose novelalgorithms withoptimal regrets(within logarithmic factors), and provide matching lower bounds to validate their optimalities. The experimental results also demonstrate the superiority of our algorithms.

artificial intelligence, machine learning, pt 1, (17 more...)

Neural Information Processing Systems

Feb-7-2026, 08:55:12 GMT

Conferences PDF

Add feedback

Country:
- Asia > China
  - Beijing > Beijing (0.04)
  - Shanghai > Shanghai (0.04)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science (0.93)

Duplicate Docs Excel Report

Title
Continuous Mean-Covariance Bandits

Similar Docs Excel Report more

Title	Similarity	Source
None found