[R] Welcoming the Era of Deep Neuroevolution • r/MachineLearning
Adding further understanding, a companion study confirms empirically that ES (with a large enough perturbation size parameter) acts differently than SGD would, because it optimizes for the expected reward of a population of policies described by a probability distribution (a cloud in the search space), whereas SGD optimizes reward for a single policy (a point in the search space). In practice, SGD in RL is accompanied by injecting parameter noise, which turns points in the search space into clouds (in expectation).
Dec-18-2017, 20:35:17 GMT
- Technology: