Multi-Task Learning for Contextual Bandits
Aniket Anand Deshmukh, Urun Dogan, Clay Scott
–Neural Information Processing Systems
The reward for each arm is random according to a fixed distribution, and the agent's goal is to maximize
Neural Information Processing Systems
Nov-21-2025, 12:23:56 GMT
- Country:
- Europe > United Kingdom (0.04)
- North America > United States
- California > Los Angeles County
- Long Beach (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.14)
- California > Los Angeles County
- Genre:
- Research Report (0.46)
- Technology: