Reward Imputation with Sketching for Contextual Batched Bandits

Neural Information Processing Systems 

Contextual batched bandit (CBB) is a setting where a batch of rewards is observed from the environment at the end of each episode, but the rewards of the non-executed actions are unobserved, resulting in partial-information feedback.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found