Local policy search with Bayesian optimization

Dec-24-2025, 17:32:22 GMT–Neural Information Processing Systems

Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of sample complexity. Actively selecting informative samples is at the core of Bayesian optimization, which constructs a probabilistic surrogate of the objective from past samples to reason about informative subsequent ones.

bayesian optimization, local policy search, name change, (7 more...)

Neural Information Processing Systems

Dec-24-2025, 17:32:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Search (0.40)
  - Machine Learning (0.40)