Evolutionary Systems
Exploring Multi-view Symbolic Regression methods in physical sciences
Russeil, Etienne, de França, Fabrício Olivetti, Malanchev, Konstantin, Moinard, Guillaume, Cherrey, Maxime
Describing the world behavior through mathematical functions help scientists to achieve a better understanding of the inner mechanisms of different phenomena. Traditionally, this is done by deriving new equations from first principles and careful observations. A modern alternative is to automate part of this process with symbolic regression (SR). The SR algorithms search for a function that adequately fits the observed data while trying to enforce sparsity, in the hopes of generating an interpretable equation. A particularly interesting extension to these algorithms is the Multi-view Symbolic Regression (MvSR). It searches for a parametric function capable of describing multiple datasets generated by the same phenomena, which helps to mitigate the common problems of overfitting and data scarcity. Recently, multiple implementations added support to MvSR with small differences between them. In this paper, we test and compare MvSR as supported in Operon, PySR, phy-SO, and eggp, in different real-world datasets. We show that they all often achieve good accuracy while proposing solutions with only few free parameters. However, we find that certain features enable a more frequent generation of better models. We conclude by providing guidelines for future MvSR developments.
Hadamard-Riemannian Optimization for Margin-Variance Ensemble
Ensemble learning has been widely recognized as a pivotal technique for boosting predictive performance by combining multiple base models. Nevertheless, conventional margin-based ensemble methods predominantly focus on maximizing the expected margin while neglecting the critical role of margin variance, which inherently restricts the generalization capability of the model and heightens its vulnerability to overfitting, particularly in noisy or imbalanced datasets. Additionally, the conventional approach of optimizing ensemble weights within the probability simplex often introduces computational inefficiency and scalability challenges, complicating its application to large-scale problems. To tackle these limitations, this paper introduces a novel ensemble learning framework that explicitly incorporates margin variance into the loss function. Our method jointly optimizes the negative expected margin and its variance, leading to enhanced robustness and improved generalization performance. Moreover, by reparameterizing the ensemble weights onto the unit sphere, we substantially simplify the optimization process and improve computational efficiency. Extensive experiments conducted on multiple benchmark datasets demonstrate that the proposed approach consistently outperforms traditional margin-based ensemble techniques, underscoring its effectiveness and practical utility.
The (R)evolution of Scientific Workflows in the Agentic AI Era: Towards Autonomous Science
Shin, Woong, Souza, Renan, Rosendo, Daniel, Suter, Frédéric, Wang, Feiyi, Balaprakash, Prasanna, da Silva, Rafael Ferreira
Modern scientific discovery increasingly requires coordinating distributed facilities and heterogeneous resources, forcing researchers to act as manual workflow coordinators rather than scientists. Advances in AI leading to AI agents show exciting new opportunities that can accelerate scientific discovery by providing intelligence as a component in the ecosystem. However, it is unclear how this new capability would materialize and integrate in the real world. To address this, we propose a conceptual framework where workflows evolve along two dimensions which are intelligence (from static to intelligent) and composition (from single to swarm) to chart an evolutionary path from current workflow management systems to fully autonomous, distributed scientific laboratories. With these trajectories in mind, we present an architectural blueprint that can help the community take the next steps towards harnessing the opportunities in autonomous science with the potential for 100x discovery acceleration and transformational scientific workflows.
Critical Challenges and Guidelines in Evaluating Synthetic Tabular Data: A Systematic Review
Nafis, Nazia, Esnaola, Inaki, Martinez-Perez, Alvaro, Villa-Uriol, Maria-Cruz, Osmani, Venet
Generating synthetic tabular data can be challenging, however evaluation of their quality is just as challenging, if not more. This systematic review sheds light on the critical importance of rigorous evaluation of synthetic health data to ensure reliability, relevance, and their appropriate use. Based on screening of 1766 papers and a detailed review of 101 papers we identified key challenges, including lack of consensus on evaluation methods, improper use of evaluation metrics, limited input from domain experts, inadequate reporting of dataset characteristics, and limited reproducibility of results. In response, we provide several guidelines on the generation and evaluation of synthetic data, to allow the community to unlock and fully harness the transformative potential of synthetic data and accelerate innovation.
A Novel Theoretical Approach on Micro-Nano Robotic Networks Based on Density Matrices and Swarm Quantum Mechanics
Mannone, Maria, Anand, Mahathi, Fazio, Peppino, Swikir, Abdalla
In a robotic swarm, parameters such as position and proximity to the target can be described in terms of probability amplitudes. This idea led to recent studies on a quantum approach to the definition of the swarm, including a block-matrix representation. Here, we propose an advancement of the idea, defining a swarm as a mixed quantum state, to be described with a density matrix, whose size does not change with the number of robots. We end the article with some directions for future research.
Parallel, Asymptotically Optimal Algorithms for Moving Target Traveling Salesman Problems
Bhat, Anoop, Gutow, Geordan, Vundurthy, Bhaskar, Ren, Zhongqiang, Rathinam, Sivakumar, Choset, Howie
Abstract--The Moving T arget Traveling Salesman Problem (MT -TSP) seeks an agent trajectory that intercepts several moving targets, within a particular time window for each target. Therefore, we introduce the Iterated Random Generalized (IRG) TSP framework. The key idea behind IRG is to alternate between randomly sampling a set of agent configuration-time points, corresponding to interceptions of targets, and finding a sequence of interception points by solving a generalized TSP (GTSP). This alternation enables asymptotic convergence to the optimum. We introduce two parallel algorithms within the IRG framework. The first algorithm, IRG-PGLNS, solves GTSPs using PGLNS, our parallelized extension of the state-of-the-art solver GLNS. The second algorithm, Parallel Communicating GTSPs (PCG), solves GTSPs corresponding to several sets of points simultaneously. We present numerical results for three variants of the MT -TSP: one where intercepting a target only requires coming within a particular distance, another where the agent is a variable-speed Dubins car, and a third where the agent is a redundant robot arm. We show that IRG-PGLNS and PCG both converge faster than a baseline based on prior work. The Traveling Salesman Problem (TSP) is a classic optimization problem with broad applications in various fields, including logistics, manufacturing, and robotics [1], [2]. Given a set of targets (often called "locations" or "cities") and the cost of travel between each target pair, the TSP seeks a minimum-cost order of targets for an agent to visit. However, several robotic applications require planning to visit moving targets: midair refueling [3], optimization of fishing routes [4], [5], resupplying ships at sea [6], surveillance [7]-[9], and intercepting dangerous projectiles [10]-[12]. In all subfigures, targets (stars) move along trajectories with time windows shown in bold colored lines.
AutoStub: Genetic Programming-Based Stub Creation for Symbolic Execution
Mächtle, Felix, Loose, Nils, Serr, Jan-Niclas, Sander, Jonas, Eisenbarth, Thomas
Symbolic execution is a powerful technique for software testing, but suffers from limitations when encountering external functions, such as native methods or third-party libraries. Existing solutions often require additional context, expensive SMT solvers, or manual intervention to approximate these functions through symbolic stubs. In this work, we propose a novel approach to automatically generate symbolic stubs for external functions during symbolic execution that leverages Genetic Programming. When the symbolic executor encounters an external function, AutoStub generates training data by executing the function on randomly generated inputs and collecting the outputs. Genetic Programming then derives expressions that approximate the behavior of the function, serving as symbolic stubs. These automatically generated stubs allow the symbolic executor to continue the analysis without manual intervention, enabling the exploration of program paths that were previously intractable. We demonstrate that AutoStub can automatically approximate external functions with over 90% accuracy for 55% of the functions evaluated, and can infer language-specific behaviors that reveal edge cases crucial for software testing.
EvolKV: Evolutionary KV Cache Compression for LLM Inference
Existing key-value (KV) cache compression methods typically rely on heuristics, such as uniform cache allocation across layers or static eviction policies, however, they ignore the critical interplays among layer-specific feature patterns and task performance, which can lead to degraded generalization. In this paper, we propose EvolKV, an adaptive framework for layer-wise, task-driven KV cache compression that jointly optimizes the memory efficiency and task performance. By reformulating cache allocation as a multi-objective optimization problem, EvolKV leverages evolutionary search to dynamically configure layer budgets while directly maximizing downstream performance. Extensive experiments on 11 tasks demonstrate that our approach outperforms all baseline methods across a wide range of KV cache budgets on long-context tasks and surpasses heuristic baselines by up to 7 percentage points on GSM8K. Notably, EvolKV achieves superior performance over the full KV cache setting on code completion while utilizing only 1.5% of the original budget, suggesting the untapped potential in learned compression strategies for KV cache budget allocation.
Two-Stage Swarm Intelligence Ensemble Deep Transfer Learning (SI-EDTL) for Vehicle Detection Using Unmanned Aerial Vehicles
Darehnaei, Zeinab Ghasemi, Shokouhifar, Mohammad, Yazdanjouei, Hossein, Fatemi, S. M. J. Rastegar
This paper introduces SI-EDTL, a two-stage swarm intelligence ensemble deep transfer learning model for detecting multiple vehicles in UAV images. It combines three pre-trained Faster R-CNN feature extractor models (InceptionV3, ResNet50, GoogLeNet) with five transfer classifiers (KNN, SVM, MLP, C4.5, Naïve Bayes), resulting in 15 different base learners. These are aggregated via weighted averaging to classify regions as Car, Van, Truck, Bus, or background. Hyperparameters are optimized with the whale optimization algorithm to balance accuracy, precision, and recall. Implemented in MATLAB R2020b with parallel processing, SI-EDTL outperforms existing methods on the AU-AIR UAV dataset.
A Kriging-HDMR-based surrogate model with sample pool-free active learning strategy for reliability analysis
Li, Wenxiong, Liao, Hanyu, Chen, Suiyin
In reliability engineering, conventional surrogate models encounter the "curse of dimensionality" as the number of random variables increases. While the active learning Kriging surrogate approaches with high-dimensional model representation (HDMR) enable effective approximation of high-dimensional functions and are widely applied to optimization problems, there are rare studies specifically focused on reliability analysis, which prioritizes prediction accuracy in critical regions over uniform accuracy across the entire domain. This study develops an active learning surrogate model method based on the Kriging-HDMR modeling for reliability analysis. The proposed approach facilitates the approximation of high-dimensional limit state functions through a composite representation constructed from multiple low-dimensional sub-surrogate models. The architecture of the surrogate modeling framework comprises three distinct stages: developing single-variable sub-surrogate models for all random variables, identifying the requirements for coupling-variable sub-surrogate models, and constructing the coupling-variable sub-surrogate models. Optimization mathematical models for selection of design of experiment samples are formulated based on each stage's characteristics, with objectives incorporating uncertainty variance, predicted mean, sample location and inter-sample distances. A candidate sample pool-free approach is adopted to achieve the selection of informative samples. Numerical experiments demonstrate that the proposed method achieves high computational efficiency while maintaining strong predictive accuracy in solving high-dimensional reliability problems.