Goto

Collaborating Authors

 Reinforcement Learning





OnReward-FreeReinforcementLearningwith LinearFunctionApproximation

Neural Information Processing Systems

During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to computeanear-optimalpolicy.