Continuous Mean-Covariance Bandits
–Neural Information Processing Systems
Specifically, in CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. The agent's objective is to achieve the best trade-off between reward and risk, measured with option covariance.
Neural Information Processing Systems
Oct-1-2025, 23:43:44 GMT