Partially Observable Contextual Bandits with Linear Payoffs
Zeng, Sihan, Bhatt, Sujay, Koppel, Alec, Ganesh, Sumitra
We study contextual bandits where the context is not fully observable, a setting that significantly departs from the classic literature. Consider the problem of making trading decisions where the arms correspond to different algorithmic trading strategies and the reward is the monetary gain. The reward is a function of the evolving market condition (context) with potential influences from various exogenous factors like Twitter feeds, secondary market behaviour, local trends, etc, of which only a small subset can be directly observed. Large institutional investors may spend additional resources on tracking other relevant features that reveal information on the true underlying context. The goal is to quickly identify and play the best strategy that maximizes the cumulative gain from trading. Motivated by problems of this nature, we introduce and study a partially observable linear contextual bandit framework, where a decision maker interacts with an environment over T rounds.
Sep-17-2024
- Country:
- North America > United States (0.04)
- Genre:
- Research Report (0.40)
- Industry:
- Banking & Finance > Trading (1.00)
- Technology: