Multi-Task Learning for Contextual Bandits

Aniket Anand Deshmukh, Urun Dogan, Clay Scott

Neural Information Processing Systems 

The reward for each arm is random according to a fixed distribution, and the agent's goal is to maximize

Similar Docs  Excel Report  more

TitleSimilaritySource
None found