5631e6ee59a4175cd06c305840562ff3-Paper.pdf

Neural Information Processing Systems 

Ateachtimestepoftheepisode,thelearnerobserves the current state of the environment, chooses one of theK available actions, and earns a reward. Consequently, the state of the environment changes according to the transition function of the underlying MDP, as a function of the previous state and the action taken by the learner.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found