Constrainedepisodicreinforcementlearningin concave-convexandknapsacksettings
–Neural Information Processing Systems
Our approach relies on the principle ofoptimism under uncertaintyto efficiently explore. Our learning algorithms optimizetheiractions withrespect toamodel based ontheempirical statistics, while optimistically overestimating rewards and underestimating the resource consumption (i.e., overestimating the distance from the constraint).
Neural Information Processing Systems
Feb-10-2026, 02:57:31 GMT
- Country:
- North America
- United States > Illinois
- Cook County > Chicago (0.04)
- Canada > British Columbia
- United States > Illinois
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America
- Technology: