Learning to Explore and Exploit in POMDPs

Cai, Chenghui, Liao, Xuejun, Carin, Lawrence

Feb-15-2020, 01:12:46 GMT–Neural Information Processing Systems

A fundamental objective in reinforcement learning is the maintenance of a proper balance between exploration and exploitation. This problem becomes more challenging when the agent can only partially observe the states of its environment. In this paper we propose a dual-policy method for jointly learning the agent behavior and the balance between exploration exploitation, in partially observable environments. The method subsumes traditional exploration, in which the agent takes actions to gather information about the environment, and active learning, in which the agent queries an oracle for optimal actions (with an associated cost for employing the oracle). The form of the employed exploration is dictated by the specific problem.

artificial intelligence, explore and exploit, upstream oil & gas, (4 more...)

Neural Information Processing Systems

Feb-15-2020, 01:12:46 GMT

Conferences Web Page

Add feedback

Industry:
- Energy > Oil & Gas > Upstream (0.73)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.30)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.40)