Adaptive Exploration for Data-Efficient General Value Function Evaluations
–Neural Information Processing Systems
General Value Functions (GVFs) (Sutton et al., 2011) represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique reward. Existing methods relying on fixed behavior policies or pre-collected data often face data efficiency issues when learning multiple GVFs in parallel using off-policy methods. To address this, we introduce GVFExplorer, which adaptively learns a single behavior policy that efficiently collects data for evaluating multiple GVFs in parallel.
Neural Information Processing Systems
Mar-22-2025, 20:56:59 GMT
- Country:
- Europe > Hungary (0.14)
- North America
- Canada (0.14)
- United States > Wisconsin (0.14)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Education (0.45)
- Technology: