Goto

Collaborating Authors

 simplicity




Towards

Neural Information Processing Systems

The Goldilocks phase is reminiscent of "intelligence from starvation" in Darwinian evolution, where resource limitations drivediscoveryofmore efficient solutions.





ALocalTemporalDifferenceCodeforDistributional ReinforcementLearning

Neural Information Processing Systems

However, since this decoder effectively approximates thenth derivative of the input vector, it is very sensitive to noise. In our framework, the input is often very noisy, since it corresponds to the converging points of different learning traces. In this section we describe two linear decoders that differ from that in [35] and are more noise-resilient. A.9 and A.10 is crucial for long temporal horizons, since regularization causes the overall magnitude of the recoveredτ-space to decrease asτ increases3. Normalization amends thedecreasing magnitude problem bymaking theτ-space to sum to 1 for everyτ.