Adapting the Exploration Rate for Value-of-Information-Based Reinforcement Learning

Dec-30-2022–arXiv.org Artificial Intelligence

In this paper, we consider the problem of adjusting the exploration rate when using value-of-information-based exploration. We do this by converting the value-of-information optimization into a problem of finding equilibria of a flow for a changing exploration rate. We then develop an efficient path-following scheme for converging to these equilibria and hence uncovering optimal action-selection policies. Under this scheme, the exploration rate is automatically adapted according to the agent's experiences. Global convergence is theoretically assured. We first evaluate our exploration-rate adaptation on the Nintendo GameBoy games Centipede and Millipede. We demonstrate aspects of the search process, like that it yields a hierarchy of state abstractions. We also show that our approach returns better policies in fewer episodes than conventional search strategies relying on heuristic, annealing-based exploration-rate adjustments. We then illustrate that these trends hold for deep, value-of-information-based agents that learn to play ten simple games and over forty more complicated games for the Nintendo GameBoy system. Performance either near or well above the level of human play is observed.

information, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Dec-30-2022

arXiv.org PDF

Add feedback

Country:
- South America > Argentina
  - Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Florida
      - Alachua County > Gainesville (0.13)
      - Orange County > Orlando (0.04)
      - Bay County > Panama City (0.04)
    - California > San Diego County
      - San Diego (0.04)
    - Arizona > Maricopa County
      - Phoenix (0.04)
    - New York
      - New York County > New York City (0.14)
      - Richmond County > New York City (0.04)
      - Queens County > New York City (0.04)
      - Kings County > New York City (0.04)
      - Bronx County > New York City (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Illinois > Cook County
      - Evanston (0.04)
    - New Jersey
      - Middlesex County > New Brunswick (0.04)
      - Mercer County > Princeton (0.04)
    - Washington > King County
      - Bellevue (0.04)
    - Pennsylvania
      - Philadelphia County > Philadelphia (0.04)
      - Allegheny County > Pittsburgh (0.04)
    - Massachusetts
      - Suffolk County > Boston (0.04)
      - Middlesex County
        Cambridge (0.04)
        Belmont (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Italy > Sardinia (0.04)
  - Germany > Berlin (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
  - Slovenia > Upper Carniola
    - Municipality of Bled > Bled (0.04)
  - Austria > Styria
    - Graz (0.04)
  - Hungary > Budapest
    - Budapest (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
  - Switzerland > Basel-City
    - Basel (0.04)
  - Finland > Uusimaa
    - Helsinki (0.04)
- Asia
  - Russia (0.14)
  - Middle East
    - Jordan (0.04)
    - Israel > Haifa District
      - Haifa (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Workflow (1.00)
- Research Report (1.00)

Industry:
- Leisure & Entertainment > Games > Computer Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Uncertainty (1.00)
    - Optimization (1.00)
    - Agents (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models (0.92)
    - Statistical Learning (0.92)
    - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found