Factored DRO: Factored Distributionally Robust Policies for Contextual Bandits

Neural Information Processing Systems 

Prior work that either ignores potential shifts in the context, or considers them jointly, can lead to performance that is too conservative, especially under certain forms of reward feedback.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found