DeepMind papers at ICML 2017 (part one) DeepMind

Aug-7-2017, 17:10:05 GMT–@machinelearnbot

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of order (HSAT)1/2 (up to a logarithmic factor) where H is the time horizon, S the number of states, A the number of actions and T the number of time-steps. This result improves over the best previous known bound HS(AT)1/2 achieved by the UCRL2 algorithm of [Jaksch, Ortner, Auer, 2010]. The key significance of our new results is that for large T, the sample complexity of our algorithm matches the optimal lower bound of Ω(HSAT)1/2. Our analysis contains two key insights.

deepmind paper, large language model, machine learning, (7 more...)

@machinelearnbot

Aug-7-2017, 17:10:05 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.76)
  - Machine Learning > Neural Networks
    - Deep Learning (0.76)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found