BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Lipton, Zachary C., Li, Xiujun, Gao, Jianfeng, Li, Lihong, Ahmed, Faisal, Deng, Li

Nov-23-2017–arXiv.org Machine Learning

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as $\epsilon$-greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additionally, we show that spiking the replay buffer with experiences from just a few successful episodes can make Q-learning feasible when it might otherwise fail.

deep learning, exploration, upstream oil & gas, (17 more...)

arXiv.org Machine Learning

Nov-23-2017

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County (0.28)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.14)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Media > Film (0.68)
- Leisure & Entertainment > Games (0.46)
- Energy > Oil & Gas
  - Upstream (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (0.46)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found