Generative Intrinsic Optimization: Intrinsic Control with Model Learning
–arXiv.org Artificial Intelligence
Future sequence represents the outcome after executing the action into the environment (i.e. the trajectory onwards). When driven by the information-theoretic concept of mutual information, it seeks maximally informative consequences. Explicit outcomes may vary across state, return, or trajectory serving different purposes such as credit assignment or imitation learning. However, the inherent nature of incorporating intrinsic motivation with reward maximization is often neglected. In this work, we propose a policy iteration scheme that seamlessly incorporates the mutual information, ensuring convergence to the optimal policy. Concurrently, a variational approach is introduced, which jointly learns the necessary quantity for estimating the mutual information and the dynamics model, providing a general framework for incorporating different forms of outcomes of interest. While we mainly focus on theoretical analysis, our approach opens the possibilities of leveraging intrinsic control with model learning to enhance sample efficiency and incorporate uncertainty of the environment into decision-making.
arXiv.org Artificial Intelligence
Nov-14-2023
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California
- Los Angeles County > Long Beach (0.14)
- Santa Clara County > Mountain View (0.04)
- Alameda County > Berkeley (0.04)
- Louisiana > Orleans Parish
- Puerto Rico > San Juan
- San Juan (0.04)
- Canada
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Alberta > Census Division No. 15
- Improvement District No. 9 > Banff (0.04)
- United States
- Europe
- Asia > Vietnam
- Oceania > Australia
- Genre:
- Research Report (0.50)
- Technology: