Constrainedepisodicreinforcementlearningin concave-convexandknapsacksettings
–Neural Information Processing Systems
Our approach relies on the principle ofoptimism under uncertaintyto efficiently explore. Our learning algorithms optimizetheiractions withrespect toamodel based ontheempirical statistics, while optimistically overestimating rewards and underestimating the resource consumption (i.e., overestimating the distance from the constraint).
Neural Information Processing Systems
Feb-10-2026, 02:57:31 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Canada > British Columbia
- United States > Illinois
- Cook County > Chicago (0.04)
- Asia > Middle East
- Technology: