OnReward-FreeReinforcementLearningwith LinearFunctionApproximation

Neural Information Processing Systems 

During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to computeanear-optimalpolicy.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found