A Local Temporal Difference Code for Distributional Reinforcement Learning Pablo T ano

Neural Information Processing Systems 

In our framework, the input is often very noisy, since it corresponds to the converging points of different learning traces. In this section we describe two linear decoders that differ from that in [35] and are more noise-resilient. A.5 is (see [37] for a derivation): p See the T emporal resolutionparagraph below for more details on the discretization of time. A.3 does not impose any explicit constraint on the's in the input vector are A.9 and A.10 is crucial for long temporal horizons, since regularization causes the overall magnitude of the recovered A.3 over the same timesteps as defined by the MP, which provides a direct approximation to the (regularized) Z-transform until a temporal horizon We found this method to be very susceptible to input noise. Figure A.2: The weights of the decoder are trained to minimize the quadratic error between the The decoding method is schematized in Fig. A.2. 's.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found