Partially Observable Contextual Bandits with Linear Payoffs

Zeng, Sihan, Bhatt, Sujay, Koppel, Alec, Ganesh, Sumitra

Sep-17-2024–arXiv.org Machine Learning

We study contextual bandits where the context is not fully observable, a setting that significantly departs from the classic literature. Consider the problem of making trading decisions where the arms correspond to different algorithmic trading strategies and the reward is the monetary gain. The reward is a function of the evolving market condition (context) with potential influences from various exogenous factors like Twitter feeds, secondary market behaviour, local trends, etc, of which only a small subset can be directly observed. Large institutional investors may spend additional resources on tracking other relevant features that reveal information on the true underlying context. The goal is to quickly identify and play the best strategy that maximizes the cumulative gain from trading. Motivated by problems of this nature, we introduce and study a partially observable linear contextual bandit framework, where a decision maker interacts with an environment over T rounds.

algorithm, bandit, latent context, (16 more...)

arXiv.org Machine Learning

Sep-17-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)

Genre:
- Research Report (0.40)

Industry:
- Banking & Finance > Trading (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning (0.68)
  - Data Science > Data Mining
    - Big Data (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found