laplace code
ALocalTemporalDifferenceCodeforDistributional ReinforcementLearning
However, since this decoder effectively approximates thenth derivative of the input vector, it is very sensitive to noise. In our framework, the input is often very noisy, since it corresponds to the converging points of different learning traces. In this section we describe two linear decoders that differ from that in [35] and are more noise-resilient. A.9 and A.10 is crucial for long temporal horizons, since regularization causes the overall magnitude of the recoveredτ-space to decrease asτ increases3. Normalization amends thedecreasing magnitude problem bymaking theτ-space to sum to 1 for everyτ.
- North America > Canada (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- North America > Canada (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
Review for NeurIPS paper: A Local Temporal Difference Code for Distributional Reinforcement Learning
Clarity: This is my biggest issue with this paper: it is _very_ difficult to follow and most of the figures are difficult to interpret. In more detail: - Overall, there are too many references to the supplemental material (e.g. "see SM-C") for things that are necessary for understanding the main paper. What do the bar plots on top of the grid represent? What are the dark and grey lines on the right plot meant to represent?