Reviews: Multi-Task Learning for Contextual Bandits
–Neural Information Processing Systems
The paper is about contextual bandits with N arms. In each round the learner observes a context x_{ti} for each arm, chooses an arm to pull and receives reward r_{ti}. The question is what structure to impose on the rewards. The authors note that E[r_{ti}] x_{ti}, theta is a common choice, as is E[r_{ti}] x_{ti}, theta_i The former allows for faster learning, but has less capacity while the latter has more capacity and slower learning. The natural question addressed in this paper concerns the middle ground, which is simultaneously generalized by kernelization. The main idea is to augment the context space so the learner observes (z_{ti}, x_{ti}) where z_{ti} lies in some other space Z. Then a kernel can be defined on this augmented space that measures similarity between contexts and determines the degree of sharing between the arms.
Neural Information Processing Systems
Oct-8-2024, 06:43:19 GMT
- Technology: