decision hinges on the actually sampled action instead of the expected behavior of the actor. This post-acting switching scheme let the overall policy make more

Neural Information Processing Systems 

T AAC adds a second-stage binary policy to choose between the previous action and a new action output by an actor.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found