Continuous Mean-Covariance Bandits
–Neural Information Processing Systems
Existing risk-aware multi-armed bandit models typically focus on risk measures of individual options such as variance. As a result, they cannot be directly applied to important real-world online decision making problems with correlated options. In this paper, we propose a novel Continuous Mean-Covariance Bandit (CMCB) model to explicitly take into account option correlation. Specifically, in CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. The agent's objective is to achieve the best trade-off between reward and risk, measured with option covariance.
Neural Information Processing Systems
Apr-24-2026, 12:49:54 GMT
- Genre:
- Research Report (0.46)
- Industry:
- Banking & Finance > Trading (0.92)
- Health & Medicine (0.69)
- Technology: