AProvablyEfficientSampleCollectionStrategy forReinforcementLearning
–Neural Information Processing Systems
One of the challenges inonline reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. Whether we optimize for regret, sample complexity, state-space coverage or model estimation, we need to strike a different exploration-exploitation trade-off.
Neural Information Processing Systems
Feb-19-2026, 01:27:23 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Massachusetts > Middlesex County > Belmont (0.04)
- Asia > Middle East
- Technology: