Appendices A Discussion of CQL Variants

Neural Information Processing Systems 

We derive several variants of CQL in Section 3.2. Here, we discuss these variants on more detail and describe their specific properties. Equation 3. To start, we define the notion of "robust expectation": for any function Q-function that penalizes the variance of Q-function predictions under the distribution ˆ P . To recap, Theorem 3.4 shows that the CQL backup operator increases the difference between expected Q-value at in-distribution ( Function approximation may give rise to erroneous Q-values at OOD actions. "generalization" or the coupling effects of the function approximator may be heavily influenced by the This problem persists even when a large number of samples (e.g.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found