We thank the reviewers for their constructive feedback and hope to clarify and address their concerns in this response

Neural Information Processing Systems 

We thank the reviewers for their constructive feedback and hope to clarify and address their concerns in this response. UVF As may help with more complex settings. We will add this explanation in the paper. Note that Assump 1 does not require binary rewards in terminal states (also see discussion after Assump 1). "stay", such that a goal position only becomes terminal if the agent chooses to stay in it.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found