Appendix: ContinuousDoublyConstrainedBatch ReinforcementLearning
–Neural Information Processing Systems
However, numbers for BCQ and SAC are from our runs for all tasks. These plots show that, in the vast majority of environments, CDC exhibits consistently better performance across different seeds/iterations.
Neural Information Processing Systems
Feb-8-2026, 21:59:52 GMT