A More Analysis

Neural Information Processing Systems 

This section describes how the objective for the encoder, model, and policy (Eq. The remaining difference between this objective and Eq. 5 is that the Q value term is scaled by This prior cannot be predicted from prior observations. Maximum entropy (MaxEnt) RL is a special case of our compression objective. In practice we perform gradient steps using the Adam [24] optimizer. An optimal agent must balance these information costs against the value of information gained from these observations.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found