Reinforcement Learning
Constrainedepisodicreinforcementlearningin concave-convexandknapsacksettings
Our approach relies on the principle ofoptimism under uncertaintyto efficiently explore. Our learning algorithms optimizetheiractions withrespect toamodel based ontheempirical statistics, while optimistically overestimating rewards and underestimating the resource consumption (i.e., overestimating the distance from the constraint).