Generalized Early Stopping in Evolutionary Direct Policy Search

Arza, Etor, Goff, Leni K. Le, Hart, Emma

arXiv.org Artificial Intelligence 

Evolutionary algorithms (EAs) are increasingly being using in applications such as computer games [De Souza, 2014, Hastings et al., 2009] and robotics [Hoffmann, 2001, Fleming and Purshouse, 2002] to learn control algorithms (policies), as well as being applied to classic control tasks such as the benchmark suites available in OpenAi Gym [Brockman et al., 2016]. Often direct policy search algorithms such as EAs require a large number of evaluations: when these evaluations are costly in terms of time, this can result in extremely long learning times, which can be prohibitive in the worst case. Unfortunately many applications of interest suffer from this problem. For example, the protein folding problem [Dill et al., 2008] requires costly simulations, while applications that involve a double optimization process are also considered very computationally costly. This includes for example the joint optimization of robot morphology and control [Hart and Le Goff, 2022, Le Goff et al., 2021] in simulation (which typically use an outer loop to evolve body-plans and a nested inner-loop to evolve control), nested combinatorial optimization problems[Wu et al., 2021, Kobeaga et al., 2021] or hyperparameter optimization [De Souza et al., 2022]. Specifically in robotics, evaluations that need to be conducted directly on a physical robot to avoid any reality-gap tend to be very time-consuming, while repeating lengthy evaluations also places considerable wear and tear on machinery, potentially leading to unreliable objective-function values.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found