Checklist
–Neural Information Processing Systems
In the main text we present the TD and ETD algorithms for policy evaluation under linear function approximation, as a way to recognize the existing literature on emphatic algorithms [27]. We here present the derivation for policy evaluation under general function approximation. Following standard notation [41], capital letters for states, actions or rewards represent the random variable at time t (i.e. St is the random variable at time t) and lowercase letters represent their instantiation (i.e. St = sis the random variable St taking value sat time t).
Neural Information Processing Systems
Apr-24-2026, 07:17:19 GMT
- Technology: