Online learning in bandits with predicted context

Oct-31-2023–arXiv.org Machine Learning

Contextual bandits (Auer, 2002; Langford and Zhang, 2007) represent a classical sequential decisionmaking problem where an agent aims to maximize cumulative reward based on context information. At each round t, the agent observes a context and must choose one of K available actions based on both the current context and previous observations. Once the agent selects an action, she observes the associated reward, which is then used to refine future decision-making. Contextual bandits are typical examples of reinforcement learning problems where a balance between exploring new actions and exploiting previously acquired information is necessary to achieve optimal long-term rewards. It has numerous real-world applications including personalized recommendation systems (Li et al., 2010; Bouneffouf et al., 2012), healthcare (Yom-Tov et al., 2017; Liao et al., 2020), and online education (Liu et al., 2014; Shaikh et al., 2019). Despite the extensive existing literature on contextual bandits, in many real-world applications, the agent never observes the context exactly.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

Oct-31-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Education > Educational Setting
  - Online (1.00)
- Health & Medicine (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning > Personal Assistant Systems (0.86)
  - Data Science > Data Mining (0.94)
  - Enterprise Applications > Human Resources
    - Learning Management (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found