Goto

Collaborating Authors

 jump function


Hyper-Heuristics Can Profit From Global Variation Operators

arXiv.org Artificial Intelligence

In recent work, Lissovoi, Oliveto, and Warwicker (Artificial Intelligence (2023)) proved that the Move Acceptance Hyper-Heuristic (MAHH) leaves the local optimum of the multimodal CLIFF benchmark with remarkable efficiency. The $O(n^3)$ runtime of the MAHH, for almost all cliff widths $d\ge 2,$ is significantly better than the $\Theta(n^d)$ runtime of simple elitist evolutionary algorithms (EAs) on CLIFF. In this work, we first show that this advantage is specific to the CLIFF problem and does not extend to the JUMP benchmark, the most prominent multi-modal benchmark in the theory of randomized search heuristics. We prove that for any choice of the MAHH selection parameter $p$, the expected runtime of the MAHH on a JUMP function with gap size $m = O(n^{1/2})$ is at least $\Omega(n^{2m-1} / (2m-1)!)$. This is significantly slower than the $O(n^m)$ runtime of simple elitist EAs. Encouragingly, we also show that replacing the local one-bit mutation operator in the MAHH with the global bit-wise mutation operator, commonly used in EAs, yields a runtime of $\min\{1, O(\frac{e\ln(n)}{m})^m\} \, O(n^m)$ on JUMP functions. This is at least as good as the runtime of simple elitist EAs. For larger values of $m$, this result proves an asymptotic performance gain over simple EAs. As our proofs reveal, the MAHH profits from its ability to walk through the valley of lower objective values in moderate-size steps, always accepting inferior solutions. This is the first time that such an optimization behavior is proven via mathematical means. Generally, our result shows that combining two ways of coping with local optima, global mutation and accepting inferior solutions, can lead to considerable performance gains.


An efficient Wasserstein-distance approach for reconstructing jump-diffusion processes using parameterized neural networks

arXiv.org Machine Learning

We analyze the Wasserstein distance ($W$-distance) between two probability distributions associated with two multidimensional jump-diffusion processes. Specifically, we analyze a temporally decoupled squared $W_2$-distance, which provides both upper and lower bounds associated with the discrepancies in the drift, diffusion, and jump amplitude functions between the two jump-diffusion processes. Then, we propose a temporally decoupled squared $W_2$-distance method for efficiently reconstructing unknown jump-diffusion processes from data using parameterized neural networks. We further show its performance can be enhanced by utilizing prior information on the drift function of the jump-diffusion process. The effectiveness of our proposed reconstruction method is demonstrated across several examples and applications.


Runtime Analysis for Permutation-based Evolutionary Algorithms

arXiv.org Artificial Intelligence

While the theoretical analysis of evolutionary algorithms (EAs) has made significant progress for pseudo-Boolean optimization problems in the last 25 years, only sporadic theoretical results exist on how EAs solve permutation-based problems. To overcome the lack of permutation-based benchmark problems, we propose a general way to transfer the classic pseudo-Boolean benchmarks into benchmarks defined on sets of permutations. We then conduct a rigorous runtime analysis of the permutation-based $(1+1)$ EA proposed by Scharnow, Tinnefeld, and Wegener (2004) on the analogues of the LeadingOnes and Jump benchmarks. The latter shows that, different from bit-strings, it is not only the Hamming distance that determines how difficult it is to mutate a permutation $\sigma$ into another one $\tau$, but also the precise cycle structure of $\sigma \tau^{-1}$. For this reason, we also regard the more symmetric scramble mutation operator. We observe that it not only leads to simpler proofs, but also reduces the runtime on jump functions with odd jump size by a factor of $\Theta(n)$. Finally, we show that a heavy-tailed version of the scramble operator, as in the bit-string case, leads to a speed-up of order $m^{\Theta(m)}$ on jump functions with jump size $m$. A short empirical analysis confirms these findings, but also reveals that small implementation details like the rate of void mutations can make an important difference.


Lazy Parameter Tuning and Control: Choosing All Parameters Randomly From a Power-Law Distribution

arXiv.org Artificial Intelligence

Most evolutionary algorithms have multiple parameters and their values drastically affect the performance. Due to the often complicated interplay of the parameters, setting these values right for a particular problem (parameter tuning) is a challenging task. This task becomes even more complicated when the optimal parameter values change significantly during the run of the algorithm since then a dynamic parameter choice (parameter control) is necessary. In this work, we propose a lazy but effective solution, namely choosing all parameter values (where this makes sense) in each iteration randomly from a suitably scaled power-law distribution. To demonstrate the effectiveness of this approach, we perform runtime analyses of the $(1+(\lambda,\lambda))$ genetic algorithm with all three parameters chosen in this manner. We show that this algorithm on the one hand can imitate simple hill-climbers like the $(1+1)$ EA, giving the same asymptotic runtime on problems like OneMax, LeadingOnes, or Minimum Spanning Tree. On the other hand, this algorithm is also very efficient on jump functions, where the best static parameters are very different from those necessary to optimize simple problems. We prove a performance guarantee that is comparable to the best performance known for static parameters. For the most interesting case that the jump size $k$ is constant, we prove that our performance is asymptotically better than what can be obtained with any static parameter choice. We complement our theoretical results with a rigorous empirical study confirming what the asymptotic runtime results suggest.


Towards a Stronger Theory for Permutation-based Evolutionary Algorithms

arXiv.org Artificial Intelligence

While the theoretical analysis of evolutionary algorithms (EAs) has made significant progress for pseudo-Boolean optimization problems in the last 25 years, only sporadic theoretical results exist on how EAs solve permutation-based problems. To overcome the lack of permutation-based benchmark problems, we propose a general way to transfer the classic pseudo-Boolean benchmarks into benchmarks defined on sets of permutations. We then conduct a rigorous runtime analysis of the permutation-based $(1+1)$ EA proposed by Scharnow, Tinnefeld, and Wegener (2004) on the analogues of the \textsc{LeadingOnes} and \textsc{Jump} benchmarks. The latter shows that, different from bit-strings, it is not only the Hamming distance that determines how difficult it is to mutate a permutation $\sigma$ into another one $\tau$, but also the precise cycle structure of $\sigma \tau^{-1}$. For this reason, we also regard the more symmetric scramble mutation operator. We observe that it not only leads to simpler proofs, but also reduces the runtime on jump functions with odd jump size by a factor of $\Theta(n)$. Finally, we show that a heavy-tailed version of the scramble operator, as in the bit-string case, leads to a speed-up of order $m^{\Theta(m)}$ on jump functions with jump size~$m$.%


An Extended Jump Function Benchmark for the Analysis of Randomized Search Heuristics

arXiv.org Artificial Intelligence

Jump functions are the most studied non-unimodal benchmark in the theory of randomized search heuristics, in particular, evolutionary algorithms (EAs). They have significantly improved our understanding of how EAs escape from local optima. However, their particular structure -- to leave the local optimum one can only jump directly to the global optimum -- raises the question of how representative such results are. For this reason, we propose an extended class $\textsc{Jump}_{k,\delta}$ of jump functions that contain a valley of low fitness of width $\delta$ starting at distance $k$ from the global optimum. We prove that several previous results extend to this more general class: for all $k = o(n^{1/3})$ and $\delta < k$, the optimal mutation rate for the $(1+1)$~EA is $\frac{\delta}{n}$, and the fast $(1+1)$~EA runs faster than the classical $(1+1)$~EA by a factor super-exponential in $\delta$. However, we also observe that some known results do not generalize: the randomized local search algorithm with stagnation detection, which is faster than the fast $(1+1)$~EA by a factor polynomial in $k$ on $\textsc{Jump}_k$, is slower by a factor polynomial in $n$ on some $\textsc{Jump}_{k,\delta}$ instances. Computationally, the new class allows experiments with wider fitness valleys, especially when they lie further away from the global optimum.


Does Comma Selection Help To Cope With Local Optima

arXiv.org Artificial Intelligence

One hope of using non-elitism in evolutionary computation is that it aids leaving local optima. We perform a rigorous runtime analysis of a basic non-elitist evolutionary algorithm (EA), the $(\mu,\lambda)$ EA, on the most basic benchmark function with a local optimum, the jump function. We prove that for all reasonable values of the parameters and the problem, the expected runtime of the $(\mu,\lambda)$ EA is, apart from lower order terms, at least as large as the expected runtime of its elitist counterpart, the $(\mu+\lambda)$~EA (for which we conduct the first runtime analysis to allow this comparison). Consequently, the ability of the $(\mu,\lambda)$ EA to leave local optima to inferior solutions does not lead to a runtime advantage. We complement this lower bound with an upper bound that, for broad ranges of the parameters, is identical to our lower bound apart from lower order terms. This is the first runtime result for a non-elitist algorithm on a multi-modal problem that is tight apart from lower order terms.


An Exponential Lower Bound for the Runtime of the cGA on Jump Functions

arXiv.org Artificial Intelligence

In the first runtime analysis of an estimation-of-distribution algorithm (EDA) on the multi-modal jump function class, Hasen\"ohrl and Sutton (GECCO 2018) proved that the runtime of the compact genetic algorithm with suitable parameter choice on jump functions with high probability is at most polynomial (in the dimension) if the jump size is at most logarithmic (in the dimension), and is at most exponential in the jump size if the jump size is super-logarithmic. The exponential runtime guarantee was achieved with a hypothetical population size that is also exponential in the jump size. Consequently, this setting cannot lead to a better runtime. In this work, we show that any choice of the hypothetical population size leads to a runtime that, with high probability, is at least exponential in the jump size. This result might be the first non-trivial exponential lower bound for EDAs that holds for arbitrary parameter settings.


A Tight Runtime Analysis for the cGA on Jump Functions---EDAs Can Cross Fitness Valleys at No Extra Cost

arXiv.org Artificial Intelligence

We prove that the compact genetic algorithm (cGA) with hypothetical population size $\mu = \Omega(\sqrt n \log n) \cap \text{poly}(n)$ with high probability finds the optimum of any $n$-dimensional jump function with jump size $k < \frac 1 {20} \ln n$ in $O(\mu \sqrt n)$ iterations. Since it is known that the cGA with high probability needs at least $\Omega(\mu \sqrt n + n \log n)$ iterations to optimize the unimodal OneMax function, our result shows that the cGA in contrast to most classic evolutionary algorithms here is able to cross moderate-sized valleys of low fitness at no extra cost. Our runtime guarantee improves over the recent upper bound $O(\mu n^{1.5} \log n)$ valid for $\mu = \Omega(n^{3.5+\varepsilon})$ of Hasen\"ohrl and Sutton (GECCO 2018). For the best choice of the hypothetical population size, this result gives a runtime guarantee of $O(n^{5+\varepsilon})$, whereas ours gives $O(n \log n)$. We also provide a simple general method based on parallel runs that, under mild conditions, (i)~overcomes the need to specify a suitable population size, but gives a performance close to the one stemming from the best-possible population size, and (ii)~transforms EDAs with high-probability performance guarantees into EDAs with similar bounds on the expected runtime.