Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits
Chakraborty, Sunrit, Roy, Saptarshi, Tewari, Ambuj
–arXiv.org Artificial Intelligence
Sequential decision-making, including bandits problems and reinforcement learning, has been one of the most active areas of research in machine learning. It formalizes the idea of selecting actions based on current knowledge to optimize some long term reward over sequentially collected data. On the other hand, the abundance of personalized information allows the learner to make decisions while incorporating this contextual information, a setup that is mathematically formalized as contextual bandits. Moreover, in the big data era, the personal information used as contexts often has a much larger size, which can be modeled by viewing the contexts as high-dimensional vectors. Examples of such models cover internet marketing and treatment assignment in personalized medicine, among many others. A particularly interesting special case of the contextual bandit problem is the linear contextual bandit problem, where the expected reward is a linear function of the features (Abe et al., 2003;
arXiv.org Artificial Intelligence
Jan-28-2023