A Discussion on Hyper parameter Tuning
–Neural Information Processing Systems
Contextual bandit is a class of online learning problems that can be viewed as a simple reinforcement learning problem without transition. For a completely understanding of contextual bandit problems, we refer the readers to the Chapter 4 of [Bubeck et al., 2012]. Here we include the main idea for completeness. In contextual bandit problems, the agent needs to find out the best action given some observed context (a.k.a the optimal policy in reinforcement learning). Formally, we define S as the context set and K as the number of action.
Neural Information Processing Systems
Aug-13-2025, 16:23:36 GMT
- Industry:
- Education > Focused Education
- Special Education (0.44)
- Energy > Oil & Gas
- Upstream (0.46)
- Education > Focused Education
- Technology: