A Derivation of Optimization Objectives For a policy π(a|s) and a dynamics model T (s: S A R can be defined as ρ