Reward Imputation with Sketching for Contextual Batched Bandits
–Neural Information Processing Systems
Contextual batched bandit (CBB) is a setting where a batch of rewards is observed from the environment at the end of each episode, but the rewards of the non-executed actions are unobserved, resulting in partial-information feedback.
Neural Information Processing Systems
Feb-17-2026, 03:33:16 GMT
- Country:
- Asia
- China
- Beijing > Beijing (0.04)
- Guangdong Province > Shenzhen (0.04)
- Middle East > Jordan (0.04)
- China
- Asia
- Industry:
- Health & Medicine (0.47)
- Technology: