A.1 Side-by-sidecomparisonofMDPandtMDP AtemporalMDPprocess: (S,A,pinit,ptrans,r) Probabilityofatrajectoryτ: pπ(τ) = pinit(s0)