Intrinsic Reward Functions

Neural Information Processing Systems 

In our approach, the intrinsic reward can be separated into two parts. One is related to action-aware diversity, while the other is related to observation-aware diversity. We revisit the formulation of our information-theoretic objective (Eq. A.1 Intrinsic Rewards for Action-Aware Diversity First we analyze term 2, which is related to action-aware diversity. T 1 T 1 X p(at| t,id) Xp(at| t,id) 2 = Eid, log q(at| t) DKL (p(at| t)kq(at| t)) Eid, log q(at| t) .

Similar Docs  Excel Report  more

TitleSimilaritySource
None found