Evolutionary Systems
Towards symbolic regression for interpretable clinical decision scores
Aldeia, Guilherme Seidyo Imai, Romano, Joseph D., de Franca, Fabricio Olivetti, Herman, Daniel S., La Cava, William G.
Medical decision-making makes frequent use of algorithms that combine risk equations with rules, providing clear and standardized treatment pathways. Symbolic regression (SR) traditionally limits its search space to continuous function forms and their parameters, making it difficult to model this decision-making. However, due to its ability to derive data-driven, interpretable models, SR holds promise for developing data-driven clinical risk scores. To that end we introduce Brush, an SR algorithm that combines decision-tree-like splitting algorithms with non-linear constant optimization, allowing for seamless integration of rule-based logic into symbolic regression and classification models. Brush achieves Pareto-optimal performance on SRBench, and was applied to recapitulate two widely used clinical scoring systems, achieving high accuracy and interpretable models. Compared to decision trees, random forests, and other SR methods, Brush achieves comparable or superior predictive performance while producing simpler models.
TRACE: A Generalizable Drift Detector for Streaming Data-Driven Optimization
Zhong, Yuan-Ting, Huang, Ting, Xiao, Xiaolin, Gong, Yue-Jiao
Many optimization tasks involve streaming data with unknown concept drifts, posing a significant challenge as Streaming Data-Driven Optimization (SDDO). Existing methods, while leveraging surrogate model approximation and historical knowledge transfer, are often under restrictive assumptions such as fixed drift intervals and fully environmental observability, limiting their adaptability to diverse dynamic environments. We propose TRACE, a TRAnsferable C}oncept-drift Estimator that effectively detects distributional changes in streaming data with varying time scales. TRACE leverages a principled tokenization strategy to extract statistical features from data streams and models drift patterns using attention-based sequence learning, enabling accurate detection on unseen datasets and highlighting the transferability of learned drift patterns. Further, we showcase TRACE's plug-and-play nature by integrating it into a streaming optimizer, facilitating adaptive optimization under unknown drifts. Comprehensive experimental results on diverse benchmarks demonstrate the superior generalization, robustness, and effectiveness of our approach in SDDO scenarios.
OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction
Jin, Emily, Nica, Andrei Cristian, Galkin, Mikhail, Rector-Brooks, Jarrid, Lee, Kin Long Kelvin, Miret, Santiago, Arnold, Frances H., Bronstein, Michael, Bose, Avishek Joey, Tong, Alexander, Liu, Cheng-Hao
Accurately predicting experimentally-realizable 3D molecular crystal structures from their 2D chemical graphs is a long-standing open challenge in computational chemistry called crystal structure prediction (CSP). Efficiently solving this problem has implications ranging from pharmaceuticals to organic semiconductors, as crystal packing directly governs the physical and chemical properties of organic solids. In this paper, we introduce OXtal, a large-scale 100M parameter all-atom diffusion model that directly learns the conditional joint distribution over intramolecular conformations and periodic packing. To efficiently scale OXtal, we abandon explicit equivariant architectures imposing inductive bias arising from crystal symmetries in favor of data augmentation strategies. We further propose a novel crystallization-inspired lattice-free training scheme, Stoichiometric Stochastic Shell Sampling ($S^4$), that efficiently captures long-range interactions while sidestepping explicit lattice parametrization -- thus enabling more scalable architectural choices at all-atom resolution. By leveraging a large dataset of 600K experimentally validated crystal structures (including rigid and flexible molecules, co-crystals, and solvates), OXtal achieves orders-of-magnitude improvements over prior ab initio machine learning CSP methods, while remaining orders of magnitude cheaper than traditional quantum-chemical approaches. Specifically, OXtal recovers experimental structures with conformer $\text{RMSD}_1<0.5$ Å and attains over 80\% packing similarity rate, demonstrating its ability to model both thermodynamic and kinetic regularities of molecular crystallization.
Lark: Biologically Inspired Neuroevolution for Multi-Stakeholder LLM Agents
Chintapalli, Dheeraj, Tanugula, Rikhil, Chandra, Sunkalp
We present Lark, a biologically inspired decision-making framework that couples LLM-driven reasoning with an evolutionary, stakeholder-aware Multi-Agent System (MAS). To address verbosity and stakeholder trade-offs, we integrate four mechanisms: (i) plasticity, which applies concise adjustments to candidate solutions; (ii) duplication and maturation, which copy high-performing candidates and specialize them into new modules; (iii) ranked-choice stakeholder aggregation using influence-weighted Borda scoring; and (iv) compute awareness via token-based penalties that reward brevity. The system iteratively proposes diverse strategies, applies plasticity tweaks, simulates stakeholder evaluations, aggregates preferences, selects top candidates, and performs duplication/maturation while factoring compute cost into final scores. In a controlled evaluation over 30 rounds comparing 14 systems, Lark Full achieves a mean rank of 2.55 (95% CI [2.17, 2.93]) and a mean composite score of 29.4/50 (95% CI [26.34, 32.46]), finishing Top-3 in 80% of rounds while remaining cost competitive with leading commercial models ($0.016 per task). Paired Wilcoxon tests confirm that all four mechanisms contribute significantly as ablating duplication/maturation yields the largest deficit (ΔScore = 3.5, Cohen's d_z = 2.53, p < 0.001), followed by plasticity (ΔScore = 3.4, d_z = 1.86), ranked-choice voting (ΔScore = 2.4, d_z = 1.20), and token penalties (ΔScore = 2.2, d_z = 1.63). Rather than a formal Markov Decision Process with constrained optimization, Lark is a practical, compute-aware neuroevolutionary loop that scales stakeholder-aligned strategy generation and makes trade-offs transparent through per-step metrics. Our work presents proof-of-concept findings and invites community feedback as we expand toward real-world validation studies.
From Frustration to Fun: An Adaptive Problem-Solving Puzzle Game Powered by Genetic Algorithm
McConnell, Matthew, Zhao, Richard
This paper explores adaptive problem solving with a game designed to support the development of problem-solving skills. Using an adaptive, AI-powered puzzle game, our adaptive problem-solving system dynamically generates pathfinding-based puzzles using a genetic algorithm, tailoring the difficulty of each puzzle to individual players in an online real-time approach. A player-modeling system records user interactions and informs the generation of puzzles to approximate a target difficulty level based on various metrics of the player. By combining procedural content generation with online adaptive difficulty adjustment, the system aims to maintain engagement, mitigate frustration, and maintain an optimal level of challenge. A pilot user study investigates the effectiveness of this approach, comparing different types of adaptive difficulty systems and interpreting players' responses. This work lays the foundation for further research into emotionally informed player models, advanced AI techniques for adaptivity, and broader applications beyond gaming in educational settings.
MOTIF: Multi-strategy Optimization via Turn-based Interactive Framework
Kiet, Nguyen Viet Tuan, Van Tung, Dao, Dao, Tran Cong, Binh, Huynh Thi Thanh
Designing effective algorithmic components remains a fundamental obstacle in tackling NP-hard combinatorial optimization problems (COPs), where solvers often rely on carefully hand-crafted strategies. Despite recent advances in using large language models (LLMs) to synthesize high-quality components, most approaches restrict the search to a single element - commonly a heuristic scoring function - thus missing broader opportunities for innovation. In this paper, we introduce a broader formulation of solver design as a multi-strategy optimization problem, which seeks to jointly improve a set of interdependent components under a unified objective. To address this, we propose Multi-strategy Optimization via Turn-based Interactive Framework (MOTIF) - a novel framework based on Monte Carlo Tree Search that facilitates turn-based optimization between two LLM agents. At each turn, an agent improves one component by leveraging the history of both its own and its opponent's prior updates, promoting both competitive pressure and emergent cooperation. This structured interaction broadens the search landscape and encourages the discovery of diverse, high-performing solutions. Experiments across multiple COP domains show that MOTIF consistently outperforms state-of-the-art methods, highlighting the promise of turn-based, multi-agent prompting for fully automated solver design.
A Multi-objective Optimization Approach for Feature Selection in Gentelligent Systems
Ghahramani, Mohammadhossein, Qiao, Yan, Wu, NaiQi, Zhou, Mengchu
Abstract--The integration of advanced technologies, such as Artificial Intelligence (AI), into manufacturing processes is attracting significant attention, paving the way for the development of intelligent systems that enhance efficiency and automation. This paper uses the term "Gentelligent system" to refer to systems that incorporate inherent component information (akin to genes in bioinformatics--where manufacturing operations are likened to chromosomes in this study) and automated mechanisms. By implementing reliable fault detection methods, manufacturers can achieve several benefits, including improved product quality, increased yield, and reduced production costs. T o support these objectives, we propose a hybrid framework with a dominance-based multi-objective evolutionary algorithm. This mechanism enables simultaneous optimization of feature selection and classification performance by exploring Pareto-optimal solutions in a single run. This solution helps monitor various manufacturing operations, addressing a range of conflicting objectives that need to be minimized together . Manufacturers can leverage such predictive methods and better adapt to emerging trends. T o strengthen the validation of our model, we incorporate two real-world datasets from different industrial domains. The results on both datasets demonstrate the generalizability and effectiveness of our approach. ORE recently, manufacturing has embraced the Industrial Internet of Things (IIoT), where digital sensors, network technologies, and gentelligent components are integrated into manufacturing processes. A gentelligent component, as defined in the Collaborative Research Centre 653 project [1], refers to components that intrinsically store information. The focus of that work is on encoding and preserving data within physical parts throughout the product lifecycle. Inspired by this concept, we extend the notion into what we define as a "gentelligent system."
Evolutionary Architecture Search through Grammar-Based Sequence Alignment
Martín, Adri Gómez, Möller, Felix, McDonagh, Steven, Abella, Monica, Desco, Manuel, Crowley, Elliot J., Klein, Aaron, Ericsson, Linus
Neural architecture search (NAS) in expressive search spaces is a computationally hard problem, but it also holds the potential to automatically discover completely novel and performant architectures. To achieve this we need effective search algorithms that can identify powerful components and reuse them in new candidate architectures. In this paper, we introduce two adapted variants of the Smith-Waterman algorithm for local sequence alignment and use them to compute the edit distance in a grammar-based evolutionary architecture search. These algorithms enable us to efficiently calculate a distance metric for neural architectures and to generate a set of hybrid offspring from two parent models. This facilitates the deployment of crossover-based search heuristics, allows us to perform a thorough analysis on the architectural loss landscape, and track population diversity during search. We highlight how our method vastly improves computational complexity over previous work and enables us to efficiently compute shortest paths between architectures. When instantiating the crossover in evolutionary searches, we achieve competitive results, outperforming competing methods. Future work can build upon this new tool, discovering novel components that can be used more broadly across neural architecture design, and broadening its applications beyond NAS.
Generative AI for Self-Adaptive Systems: State of the Art and Research Roadmap
Li, Jialong, Zhang, Mingyue, Li, Nianyu, Weyns, Danny, Jin, Zhi, Tei, Kenji
Self-adaptive systems (SASs) are designed to handle changes and uncertainties through a feedback loop with four core functionalities: monitoring, analyzing, planning, and execution. Recently, generative artificial intelligence (GenAI), especially the area of large language models, has shown impressive performance in data comprehension and logical reasoning. These capabilities are highly aligned with the functionalities required in SASs, suggesting a strong potential to employ GenAI to enhance SASs. However, the specific benefits and challenges of employing GenAI in SASs remain unclear. Yet, providing a comprehensive understanding of these benefits and challenges is complex due to several reasons: limited publications in the SAS field, the technological and application diversity within SASs, and the rapid evolution of GenAI technologies. To that end, this paper aims to provide researchers and practitioners a comprehensive snapshot that outlines the potential benefits and challenges of employing GenAI's within SAS. Specifically, we gather, filter, and analyze literature from four distinct research fields and organize them into two main categories to potential benefits: (i) enhancements to the autonomy of SASs centered around the specific functions of the MAPE-K feedback loop, and (ii) improvements in the interaction between humans and SASs within human-on-the-loop settings. From our study, we outline a research roadmap that highlights the challenges of integrating GenAI into SASs. The roadmap starts with outlining key research challenges that need to be tackled to exploit the potential for applying GenAI in the field of SAS. The roadmap concludes with a practical reflection, elaborating on current shortcomings of GenAI and proposing possible mitigation strategies.
MultiGA: Leveraging Multi-Source Seeding in Genetic Algorithms
Ng, Isabelle Diana May-Xin, Weerasooriya, Tharindu Cyril, Zhu, Haitao, Wei, Wei
Large Language Models (LLMs) are widely used across research domains to tackle complex tasks, but their performance can vary significantly depending on the task at hand. Evolutionary algorithms, inspired by natural selection, can be used to refine solutions iteratively at inference-time. To the best of our knowledge, there has not been exploration on leveraging the collective capabilities of multi-source seeding for LLM-guided genetic algorithms. In this paper, we introduce a novel approach, MultiGA, which applies genetic algorithm principles to address complex natural language tasks and reasoning problems by sampling from a diverse population of LLMs to initialize the population. MultiGA generates a range of outputs from various parent LLMs, open source and closed source, and uses a neutral fitness function to evaluate them. Through an iterative recombination process, we mix and refine these generations until an optimal solution is achieved. We benchmark our approach using text-to-SQL code generation tasks, trip planning, GPQA benchmark for grad-level science questions, and the BBQ bias benchmark. Our results show that MultiGA converges to the accuracy of the LLM best fit for the task, and these insights lay the foundation for future research looking closer at integrating multiple LLMs for unexplored tasks in which selecting only one pre-trained model is unclear or suboptimal.