AnEfficientAsynchronousMethodforIntegrating EvolutionaryandGradient-basedPolicySearch
–Neural Information Processing Systems
These have the opposite properties, with DRL having good sample efficiencyandpoor stability, while ESbeing vice versa. Recently,there havebeen attempts tocombine these algorithms, butthesemethods fullyrelyonsynchronous updatescheme, making it not ideal to maximize the benefits of the parallelism in ES.
Neural Information Processing Systems
Feb-8-2026, 22:27:37 GMT
- Country:
- Technology: