Black-Box Policy Search with Probabilistic Programs
van de Meent, Jan-Willem, Paige, Brooks, Tolpin, David, Wood, Frank
–arXiv.org Artificial Intelligence
In this work, we explore how probabilistic programs can be used to represent policies in sequential decision problems. In this formulation, a probabilistic program is a black-box stochastic simulator for both the problem domain and the agent. We relate classic policy gradient techniques to recently introduced black-box variational methods which generalize to probabilistic program inference. We present case studies in the Canadian traveler problem, Rock Sample, and a benchmark for optimal diagnosis inspired by Guess Who. Each study illustrates how programs can efficiently represent policies using moderate numbers of parameters.
arXiv.org Artificial Intelligence
Aug-4-2016
- Country:
- North America
- United States
- Virginia > Arlington County
- Arlington (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Virginia > Arlington County
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Spain > Andalusia
- Cádiz Province > Cadiz (0.04)
- United Kingdom > England
- Asia > Middle East
- Jordan (0.04)
- North America
- Genre:
- Research Report (0.40)
- Industry:
- Transportation > Air (0.83)