A Proof of the strong duality 4

Neural Information Processing Systems 

The third inequality follows from identifying that for a given λ, the best policy may be defined pointwise as the argument of the maximum written in the expectation. Thus, only the middle equality () deserves a proof. We obtain it by applying a general theorem of strong duality (which requires feasibility for slightly smaller cost constraints). We restate a result extracted from the monograph by Luenberger [1969]. It relies on the dual functional φ, whose expression we recall below.