A More Analysis

Neural Information Processing Systems 

A.1 Objective for the Encoder, Model, and Policy This section describes how the objective for the encoder, model, and policy (Eq. Our aim is to maximizing the sum of (information-augmented) rewards (Eq. The remaining difference between this objective and Eq. 5 is that the Q value term is scaled by γ. However, this difference has no effect on the optimization problem because the parameter λ is automatically tuned. If we scale the second term, Q, by some value (say, γ), then tuning λ to satisfy the bitrate constraint will result in a different value for λ (one which is γ times smaller).