Optimal Thresholding Linear Bandit

Rivera, Eduardo Ochoa, Tewari, Ambuj

arXiv.org Artificial Intelligence 

The Thresholding Bandit Problem (TBP) Andrea Locatelli (2016); Kano et al. (2019) represents a specific combinatorial The study by Kano et al. (2019) emphasizes that in certain contexts, such as personalized recommendations, pursuing In this scenario, the arms' mean rewards follow a linear model with unknown parameters. We prove an instance-specific lower bound for the expected sample complexity of any correct algorithm.