ISL: Optimal Policy Learning With Optimal Exploration-Exploitation Trade-Off

Sep-13-2019–arXiv.org Artificial Intelligence

Traditionally, off-policy learning algorithms (such as Q-learning) and exploration schemes have been derived separately. Often times, the exploration-exploitation dilemma being addressed through heuristics. In this article we show that both the learning equations and the exploration-exploitation strategy can be derived in tandem as the solution to a unique and well-posed optimization problem whose minimization leads to the optimal value function. We present a new algorithm following this idea. The algorithm is of the gradient type (and therefore has good convergence properties even when used in conjunction with function approximators such as neural networks); it is off-policy; and it specifies both the update equations and the strategy to address the exploration-exploitation dilemma. To the best of our knowledge, this is the first algorithm that has these properties.

artificial intelligence, null, upstream oil & gas, (19 more...)

arXiv.org Artificial Intelligence

Sep-13-2019

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.14)
- North America > United States
  - California > Los Angeles County > Los Angeles (0.28)
- Europe
  - Sweden (0.14)
  - Switzerland (0.14)
  - Spain (0.14)

Genre:
- Research Report (0.40)

Industry:
- Energy > Oil & Gas > Upstream (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found