Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits

Chakraborty, Sunrit, Roy, Saptarshi, Tewari, Ambuj

Jan-28-2023–arXiv.org Artificial Intelligence

Sequential decision-making, including bandits problems and reinforcement learning, has been one of the most active areas of research in machine learning. It formalizes the idea of selecting actions based on current knowledge to optimize some long term reward over sequentially collected data. On the other hand, the abundance of personalized information allows the learner to make decisions while incorporating this contextual information, a setup that is mathematically formalized as contextual bandits. Moreover, in the big data era, the personal information used as contexts often has a much larger size, which can be modeled by viewing the contexts as high-dimensional vectors. Examples of such models cover internet marketing and treatment assignment in personalized medicine, among many others. A particularly interesting special case of the contextual bandit problem is the linear contextual bandit problem, where the expected reward is a linear function of the features (Abe et al., 2003;

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Jan-28-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.67)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (0.67)
  - Therapeutic Area > Oncology (1.00)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Statistical Learning (0.93)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found