PlanningwithGeneralObjectiveFunctions: GoingBeyondTotalRewards

Neural Information Processing Systems 

Note that inthis simple example, the state transition functionT and the reward functionr stillsatisfy theMarkovproperty.