Appendices A Discussion of CQL Variants
–Neural Information Processing Systems
We derive several variants of CQL in Section 3.2. Here, we discuss these variants on more detail and describe their specific properties. Equation 3. To start, we define the notion of "robust expectation": for any function Q-function that penalizes the variance of Q-function predictions under the distribution ˆ P . To recap, Theorem 3.4 shows that the CQL backup operator increases the difference between expected Q-value at in-distribution ( Function approximation may give rise to erroneous Q-values at OOD actions. "generalization" or the coupling effects of the function approximator may be heavily influenced by the This problem persists even when a large number of samples (e.g.
Neural Information Processing Systems
Dec-27-2025, 22:43:02 GMT
- Technology: