9861a7c3972ed5d36dda3826d44bb246-Paper-Conference.pdf

Neural Information Processing Systems 

We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found