Constrainedepisodicreinforcementlearningin concave-convexandknapsacksettings

Neural Information Processing Systems 

Our approach relies on the principle ofoptimism under uncertaintyto efficiently explore. Our learning algorithms optimizetheiractions withrespect toamodel based ontheempirical statistics, while optimistically overestimating rewards and underestimating the resource consumption (i.e., overestimating the distance from the constraint).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found