Meta Learning in Bandits within Shared Affine Subspaces

Bilaj, Steven, Dhouib, Sofien, Maghsudi, Setareh

arXiv.org Machine Learning 

In the applications mentioned above, the tasks often relate to each other despite being different. For instance, subgroups of patients have comparable features. As another We study the problem of meta-learning several example, holidays or discount periods promote similar interests contextual stochastic bandits tasks by leveraging in the products of an e-commerce website. That observation their concentration around a low-dimensional motivates us to look beyond a single task to uncover affine subspace, which we learn via online principal a relation between different ones to accelerate learning component analysis to reduce the expected on newly encountered tasks. That problem, referred regret over the encountered bandits. We propose to as meta-learning or learning-to-learn (LTL), has mainly and theoretically analyze two strategies that solve appeared in the offline learning literature so far (Hutter the problem: One based on the principle of optimism et al., 2019). Nevertheless, an emergent body of literature in the face of uncertainty and the other via combines LTL and MAB to accelerate learning and reduce Thompson sampling. Our framework is generic the average regret per task (Cella et al., 2020; Cella and and includes previously proposed approaches as Pontil, 2021; Bilaj et al., 2023).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found