Intrinsic Reward Functions
–Neural Information Processing Systems
In our approach, the intrinsic reward can be separated into two parts. One is related to action-aware diversity, while the other is related to observation-aware diversity. We revisit the formulation of our information-theoretic objective (Eq. A.1 Intrinsic Rewards for Action-Aware Diversity First we analyze term 2, which is related to action-aware diversity. T 1 T 1 X p(at| t,id) Xp(at| t,id) 2 = Eid, log q(at| t) DKL (p(at| t)kq(at| t)) Eid, log q(at| t) .
Neural Information Processing Systems
Apr-25-2026, 01:35:00 GMT
- Technology: