[R] Welcoming the Era of Deep Neuroevolution • r/MachineLearning

@machinelearnbot 

Adding further understanding, a companion study confirms empirically that ES (with a large enough perturbation size parameter) acts differently than SGD would, because it optimizes for the expected reward of a population of policies described by a probability distribution (a cloud in the search space), whereas SGD optimizes reward for a single policy (a point in the search space). In practice, SGD in RL is accompanied by injecting parameter noise, which turns points in the search space into clouds (in expectation).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found