A Proof of the strong duality 4
–Neural Information Processing Systems
The third inequality follows from identifying that for a given λ, the best policy may be defined pointwise as the argument of the maximum written in the expectation. Thus, only the middle equality () deserves a proof. We obtain it by applying a general theorem of strong duality (which requires feasibility for slightly smaller cost constraints). We restate a result extracted from the monograph by Luenberger [1969]. It relies on the dual functional φ, whose expression we recall below.
Neural Information Processing Systems
Feb-11-2025, 05:09:10 GMT