Checklist
–Neural Information Processing Systems
In the main text we present the TD and ETD algorithms for policy evaluation under linear function approximation, as a way to recognize the existing literature on emphatic algorithms [27]. We here present the derivation for policy evaluation under general function approximation. Following standard notation [41], capital letters for states, actions or rewards represent the random variable at time t (i.e. S
Neural Information Processing Systems
Feb-18-2024, 04:15:31 GMT
- Technology: