OnReward-FreeReinforcementLearningwith LinearFunctionApproximation
–Neural Information Processing Systems
During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to computeanear-optimalpolicy.
Neural Information Processing Systems
Feb-10-2026, 11:12:37 GMT