Continuous Mean-Covariance Bandits

Neural Information Processing Systems 

Specifically, in CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. The agent's objective is to achieve the best trade-off between reward and risk, measured with option covariance.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found