Variational Bayesian Reinforcement Learning with Regret Bounds
–Neural Information Processing Systems
In reinforcement learning the Q-values summarize the expected future rewards that the agent will attain.
Neural Information Processing Systems
Aug-18-2025, 15:15:17 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East