Adaptive Exploration for Data-Efficient General Value Function Evaluations
–Neural Information Processing Systems
General Value Functions (GVFs) (Sutton et al., 2011) represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique reward. Existing methods relying on fixed behavior policies or pre-collected data often face data efficiency issues when learning multiple GVFs in parallel using off-policy methods. To address this, we introduce, which adaptively learns a single behavior policy that efficiently collects data for evaluating multiple GVFs in parallel.
Neural Information Processing Systems
Mar-21-2026, 05:39:20 GMT
- Technology: