Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness

Neural Information Processing Systems 

We consider contextual bandit problems with knapsacks [CBwK], a problem where at each round, a scalar reward is obtained and vector-valued costs are suffered. The learner aims to maximize the cumulative rewards while ensuring that the cumulative costs are lower than some predetermined cost constraints. We assume that contexts come from a continuous set, that costs can be signed, and that the expected reward and cost functions, while unknown, may be uniformly estimated-- a typical assumption in the literature.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found