Reward Shaping via Diffusion Process in Reinforcement Learning

Jun-20-2023–arXiv.org Artificial Intelligence

In this article, I take inspiration from stochastic thermodynamics to derive a problem formulation for online learning in uncertain MDPs while grounded in system dynamics. The system balances the diffusion process with drif dynamics as a way to formulate the explorationexploitation trade-off. To this effect, I make an explicit link between the information entropy and the stochastic dynamics of a system coupled to an environment. I analyze various sources of entropy production: due to the decision-maker's uncertainty about the system-environment interaction characteristics; due to the stochastic nature of system dynamics; and the interaction of the decision maker's knowledge with system dynamics. This analysis provides a framework that can be formulated either as a maximum entropy program to derive efficient policies that balance the exploration and exploitation trade-off, or as a modified cost optimization program that includes informational costs and benefits.

artificial intelligence, information, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Jun-20-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine (0.93)
- Energy > Oil & Gas
  - Upstream (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.46)
  - Machine Learning > Learning Graphical Models
    - Directed Networks > Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found