Bigger, Regularized, Optimistic: scaling for compute and sample efficient continuous control

May-27-2025, 16:56:58 GMT–Neural Information Processing Systems

Sample efficiency in Reinforcement Learning (RL) has traditionally been driven by algorithmic enhancements. In this work, we demonstrate that scaling can also lead to substantial improvements. We conduct a thorough investigation into the interplay of scaling model capacity and domain-specific RL enhancements. These empirical findings inform the design choices underlying our proposed BRO (Bigger, Regularized, Optimistic) algorithm. The key innovation behind BRO is that strong regularization allows for effective scaling of the critic networks, which, paired with optimistic exploration, leads to superior performance.

optimistic, regularized, sample efficient continuous control, (4 more...)

Neural Information Processing Systems

May-27-2025, 16:56:58 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)