Goto

Collaborating Authors

 Evolutionary Systems


Boundary Exploration for Bayesian Optimization With Unknown Physical Constraints

arXiv.org Artificial Intelligence

Bayesian optimization has been successfully applied to optimize black-box functions where the number of evaluations is severely limited. However, in many real-world applications, it is hard or impossible to know in advance which designs are feasible due to some physical or system limitations. These issues lead to an even more challenging problem of optimizing an unknown function with unknown constraints. In this paper, we observe that in such scenarios optimal solution typically lies on the boundary between feasible and infeasible regions of the design space, making it considerably more difficult than that with interior optima. Inspired by this observation, we propose BE-CBO, a new Bayesian optimization method that efficiently explores the boundary between feasible and infeasible designs. To identify the boundary, we learn the constraints with an ensemble of neural networks that outperform the standard Gaussian Processes for capturing complex boundaries. Our method demonstrates superior performance against state-of-the-art methods through comprehensive experiments on synthetic and real-world benchmarks.


EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages

arXiv.org Artificial Intelligence

Formal mathematics is the discipline of translating mathematics into a programming language in which any statement can be unequivocally checked by a computer. Mathematicians and computer scientists have spent decades of painstaking formalization efforts developing languages such as Coq, HOL, and Lean. Machine learning research has converged on these formal math corpora and given rise to an assortment of methodologies to aid in interactive and automated theorem proving. However, these papers have primarily focused on one method, for one proof task, in one language. This paper introduces EvoGPT-f: a novel evolutionary framework for the first systematic quantitative analysis of the differential machine learnability of five formal math corpora (Lean 3, Lean 4, Coq, HOL 4, HOL Light) using four tokenization methods (character, word-level, Byte Pair Encoding and StarCoder tokenizer). This paper does not put to rest the question of the "best" or "easiest" language to learn. Rather, this framework and preliminary findings begin to illuminate the differential machine learnability of these languages, offering a foundation to forge more systematic quantitative and qualitative comparative research across communities.


Understanding fitness landscapes in morpho-evolution via local optima networks

arXiv.org Artificial Intelligence

Morpho-evolution (ME) refers to the simultaneous optimisation of a robot's design and controller to maximise performance given a task and environment. Many genetic encodings have been proposed which are capable of representing design and control. Previous research has provided empirical comparisons between encodings in terms of their performance with respect to an objective function and the diversity of designs that are evaluated, however there has been no attempt to explain the observed findings. We address this by applying Local Optima Network (LON) analysis to investigate the structure of the fitness landscapes induced by three different encodings when evolving a robot for a locomotion task, shedding new light on the ease by which different fitness landscapes can be traversed by a search process. This is the first time LON analysis has been applied in the field of ME despite its popularity in combinatorial optimisation domains; the findings will facilitate design of new algorithms or operators that are customised to ME landscapes in the future.


SAIS: A Novel Bio-Inspired Artificial Immune System Based on Symbiotic Paradigm

arXiv.org Artificial Intelligence

We propose a novel type of Artificial Immune System (AIS): Symbiotic Artificial Immune Systems (SAIS), drawing inspiration from symbiotic relationships in biology. SAIS parallels the three key stages (i.e., mutualism, commensalism and parasitism) of population updating from the Symbiotic Organisms Search (SOS) algorithm. This parallel approach effectively addresses the challenges of large population size and enhances population diversity in AIS, which traditional AIS and SOS struggle to resolve efficiently. We conducted a series of experiments, which demonstrated that our SAIS achieved comparable performance to the state-of-the-art approach SOS and outperformed other popular AIS approaches and evolutionary algorithms across 26 benchmark problems. Furthermore, we investigated the problem of parameter selection and found that SAIS performs better in handling larger population sizes while requiring fewer generations. Finally, we believe SAIS, as a novel bio-inspired and immune-inspired algorithm, paves the way for innovation in bio-inspired computing with the symbiotic paradigm.


Guided Sketch-Based Program Induction by Search Gradients

arXiv.org Artificial Intelligence

Many tasks can be easily solved using machine learning techniques. However, some tasks cannot readily be solved using statistical models, requiring a symbolic approach instead. Program induction is one of the ways that such tasks can be solved by means of capturing an interpretable and generalizable algorithm through training. However, contemporary approaches to program induction are not sophisticated enough to readily be applied to various types of tasks as they tend to be formulated as a single, all-encompassing model, usually parameterized by neural networks. In an attempt to make program induction a viable solution for many scenarios, we propose a framework for learning parameterized programs via search gradients using evolution strategies. This formulation departs from traditional program induction as it allows for the programmer to impart task-specific code to the program 'sketch', while also enjoying the benefits of accelerated learning through end-to-end gradient-based optimization.


Evaluation Metrics for Automated Typographic Poster Generation

arXiv.org Artificial Intelligence

Computational Design approaches facilitate the generation of typographic design, but evaluating these designs remains a challenging task. In this paper, we propose a set of heuristic metrics for typographic design evaluation, focusing on their legibility, which assesses the text visibility, aesthetics, which evaluates the visual quality of the design, and semantic features, which estimate how effectively the design conveys the content semantics. We experiment with a constrained evolutionary approach for generating typographic posters, incorporating the proposed evaluation metrics with varied setups, and treating the legibility metrics as constraints. We also integrate emotion recognition to identify text semantics automatically and analyse the performance of the approach and the visual characteristics outputs.


A Functional Analysis Approach to Symbolic Regression

arXiv.org Artificial Intelligence

Symbolic regression (SR) poses a significant challenge for randomized search heuristics due to its reliance on the synthesis of expressions for input-output mappings. Although traditional genetic programming (GP) algorithms have achieved success in various domains, they exhibit limited performance when tree-based representations are used for SR. To address these limitations, we introduce a novel SR approach called Fourier Tree Growing (FTG) that draws insights from functional analysis. This new perspective enables us to perform optimization directly in a different space, thus avoiding intricate symbolic expressions. Our proposed algorithm exhibits significant performance improvements over traditional GP methods on a range of classical one-dimensional benchmarking problems. To identify and explain limiting factors of GP and FTG, we perform experiments on a large-scale polynomials benchmark with high-order polynomials up to degree 100. To the best of the authors' knowledge, this work represents the pioneering application of functional analysis in addressing SR problems. The superior performance of the proposed algorithm and insights into the limitations of GP open the way for further advancing GP for SR and related areas of explainable machine learning.


How Much is Unseen Depends Chiefly on Information About the Seen

arXiv.org Artificial Intelligence

It might seem counter-intuitive at first: We find that, in expectation, the proportion of data points in an unknown population-that belong to classes that do not appear in the training data-is almost entirely determined by the number $f_k$ of classes that do appear in the training data the same number of times. While in theory we show that the difference of the induced estimator decays exponentially in the size of the sample, in practice the high variance prevents us from using it directly for an estimator of the sample coverage. However, our precise characterization of the dependency between $f_k$'s induces a large search space of different representations of the expected value, which can be deterministically instantiated as estimators. Hence, we turn to optimization and develop a genetic algorithm that, given only the sample, searches for an estimator with minimal mean-squared error (MSE). In our experiments, our genetic algorithm discovers estimators that have a substantially smaller MSE than the state-of-the-art Good-Turing estimator. This holds for over 96% of runs when there are at least as many samples as classes. Our estimators' MSE is roughly 80% of the Good-Turing estimator's.


Guided Evolution with Binary Discriminators for ML Program Search

arXiv.org Artificial Intelligence

How to automatically design better machine learning programs is an open problem within AutoML. While evolution has been a popular tool to search for better ML programs, using learning itself to guide the search has been less successful and less understood on harder problems but has the promise to dramatically increase the speed and final performance of the optimization process. We propose guiding evolution with a binary discriminator, trained online to distinguish which program is better given a pair of programs. The discriminator selects better programs without having to perform a costly evaluation and thus speed up the convergence of evolution. Our method can encode a wide variety of ML components including symbolic optimizers, neural architectures, RL loss functions, and symbolic regression equations with the same directed acyclic graph representation. By combining this representation with modern GNNs and an adaptive mutation strategy, we demonstrate our method can speed up evolution across a set of diverse problems including a 3.7x speedup on the symbolic search for ML optimizers and a 4x speedup for RL loss functions.


DALex: Lexicase-like Selection via Diverse Aggregation

arXiv.org Artificial Intelligence

Lexicase selection has been shown to provide advantages over other selection algorithms in several areas of evolutionary computation and machine learning. In its standard form, lexicase selection filters a population or other collection based on randomly ordered training cases that are considered one at a time. This iterated filtering process can be time-consuming, particularly in settings with large numbers of training cases. In this paper, we propose a new method that is nearly equivalent to lexicase selection in terms of the individuals that it selects, but which does so significantly more quickly. The new method, called DALex (for Diversely Aggregated Lexicase), selects the best individual with respect to a weighted sum of training case errors, where the weights are randomly sampled. This allows us to formulate the core computation required for selection as matrix multiplication instead of recursive loops of comparisons, which in turn allows us to take advantage of optimized and parallel algorithms designed for matrix multiplication for speedup. Furthermore, we show that we can interpolate between the behavior of lexicase selection and its "relaxed" variants, such as epsilon or batch lexicase selection, by adjusting a single hyperparameter, named "particularity pressure," which represents the importance granted to each individual training case. Results on program synthesis, deep learning, symbolic regression, and learning classifier systems demonstrate that DALex achieves significant speedups over lexicase selection and its relaxed variants while maintaining almost identical problem-solving performance. Under a fixed computational budget, these savings free up resources that can be directed towards increasing population size or the number of generations, enabling the potential for solving more difficult problems.