Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time

Neural Information Processing Systems 

Our approach combines three key ideas: (1) value-demand augmentation, (2) action-space approximate dynamic programming, and (3) time-space rounding.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found