Goto

Collaborating Authors

 Evolutionary Systems


Symbolic Regression via Control Variable Genetic Programming

arXiv.org Artificial Intelligence

Learning symbolic expressions directly from experiment data is a vital step in AI-driven scientific discovery. Nevertheless, state-of-the-art approaches are limited to learning simple expressions. Regressing expressions involving many independent variables still remain out of reach. Motivated by the control variable experiments widely utilized in science, we propose Control Variable Genetic Programming (CVGP) for symbolic regression over many independent variables. CVGP expedites symbolic expression discovery via customized experiment design, rather than learning from a fixed dataset collected a priori. CVGP starts by fitting simple expressions involving a small set of independent variables using genetic programming, under controlled experiments where other variables are held as constants. It then extends expressions learned in previous generations by adding new independent variables, using new control variable experiments in which these variables are allowed to vary. Theoretically, we show CVGP as an incremental building approach can yield an exponential reduction in the search space when learning a class of expressions. Experimentally, CVGP outperforms several baselines in learning symbolic expressions involving multiple independent variables.


Data-driven Science and Machine Learning Methods in Laser-Plasma Physics

arXiv.org Artificial Intelligence

Laser-plasma physics has developed rapidly over the past few decades as high-power lasers have become both increasingly powerful and more widely available. Early experimental and numerical research in this field was restricted to single-shot experiments with limited parameter exploration. However, recent technological improvements make it possible to gather an increasing amount of data, both in experiments and simulations. This has sparked interest in using advanced techniques from mathematics, statistics and computer science to deal with, and benefit from, big data. At the same time, sophisticated modeling techniques also provide new ways for researchers to effectively deal with situations in which still only sparse amounts of data are available. This paper aims to present an overview of relevant machine learning methods with focus on applicability to laser-plasma physics, including its important sub-fields of laser-plasma acceleration and inertial confinement fusion.


Using evolutionary machine learning to characterize and optimize co-pyrolysis of biomass feedstocks and polymeric wastes

arXiv.org Artificial Intelligence

Co-pyrolysis of biomass feedstocks with polymeric wastes is a promising strategy for improving the quantity and quality parameters of the resulting liquid fuel. Numerous experimental measurements are typically conducted to find the optimal operating conditions. However, performing co-pyrolysis experiments is highly challenging due to the need for costly and lengthy procedures. Machine learning (ML) provides capabilities to cope with such issues by leveraging on existing data. This work aims to introduce an evolutionary ML approach to quantify the (by)products of the biomass-polymer co-pyrolysis process. A comprehensive dataset covering various biomass-polymer mixtures under a broad range of process conditions is compiled from the qualified literature. The database was subjected to statistical analysis and mechanistic discussion. The input features are constructed using an innovative approach to reflect the physics of the process. The constructed features are subjected to principal component analysis to reduce their dimensionality. The obtained scores are introduced into six ML models. Gaussian process regression model tuned by particle swarm optimization algorithm presents better prediction performance (R2 > 0.9, MAE < 0.03, and RMSE < 0.06) than other developed models. The multi-objective particle swarm optimization algorithm successfully finds optimal independent parameters.


Differentially Private Synthetic Data via Foundation Model APIs 1: Images

arXiv.org Artificial Intelligence

Generating differentially private (DP) synthetic data that closely resembles the original private data without leaking sensitive user information is a scalable way to mitigate privacy concerns in the current data-driven world. In contrast to current practices that train customized models for this task, we aim to generate DP Synthetic Data via APIs (DPSDA), where we treat foundation models as blackboxes and only utilize their inference APIs. Such API-based, training-free approaches are easier to deploy as exemplified by the recent surge in the number of API-based apps. These approaches can also leverage the power of large foundation models which are accessible via their inference APIs while the model weights are unreleased. However, this comes with greater challenges due to strictly more restrictive model access and the additional need to protect privacy from the API provider. In this paper, we present a new framework called Private Evolution (PE) to solve this problem and show its initial promise on synthetic images. Surprisingly, PE can match or even outperform state-of-the-art (SOTA) methods without any model training. For example, on CIFAR10 (with ImageNet as the public data), we achieve FID<=7.9 with privacy cost epsilon=0.67, significantly improving the previous SOTA from epsilon=32. We further demonstrate the promise of applying PE on large foundation models such as Stable Diffusion to tackle challenging private datasets with a small number of high-resolution images.


Selection for short-term empowerment accelerates the evolution of homeostatic neural cellular automata

arXiv.org Artificial Intelligence

Empowerment -- a domain independent, information-theoretic metric -- has previously been shown to assist in the evolutionary search for neural cellular automata (NCA) capable of homeostasis when employed as a fitness function. In our previous study, we successfully extended empowerment, defined as maximum time-lagged mutual information between agents' actions and future sensations, to a distributed sensorimotor system embodied as an NCA. However, the time-delay between actions and their corresponding sensations was arbitrarily chosen. Here, we expand upon previous work by exploring how the time scale at which empowerment operates impacts its efficacy as an auxiliary objective to accelerate the discovery of homeostatic NCAs. We show that shorter time delays result in marked improvements over empowerment with longer delays, when compared to evolutionary selection only for homeostasis. Moreover, we evaluate stability and adaptability of evolved NCAs, both hallmarks of living systems that are of interest to replicate in artificial ones. We find that short-term empowered NCA are more stable and are capable of generalizing better to unseen homeostatic challenges. Taken together, these findings motivate the use of empowerment during the evolution of other artifacts, and suggest how it should be incorporated to accelerate evolution of desired behaviors for them. Source code for the experiments in this paper can be found at: https://github.com/caitlingrasso/empowered-nca-II.


FITNESS: A Causal De-correlation Approach for Mitigating Bias in Machine Learning Software

arXiv.org Artificial Intelligence

Software built on top of machine learning algorithms is becoming increasingly prevalent in a variety of fields, including college admissions, healthcare, insurance, and justice. The effectiveness and efficiency of these systems heavily depend on the quality of the training datasets. Biased datasets can lead to unfair and potentially harmful outcomes, particularly in such critical decision-making systems where the allocation of resources may be affected. This can exacerbate discrimination against certain groups and cause significant social disruption. To mitigate such unfairness, a series of bias-mitigating methods are proposed. Generally, these studies improve the fairness of the trained models to a certain degree but with the expense of sacrificing the model performance. In this paper, we propose FITNESS, a bias mitigation approach via de-correlating the causal effects between sensitive features (e.g., the sex) and the label. Our key idea is that by de-correlating such effects from a causality perspective, the model would avoid making predictions based on sensitive features and thus fairness could be improved. Furthermore, FITNESS leverages multi-objective optimization to achieve a better performance-fairness trade-off. To evaluate the effectiveness, we compare FITNESS with 7 state-of-the-art methods in 8 benchmark tasks by multiple metrics. Results show that FITNESS can outperform the state-of-the-art methods on bias mitigation while preserve the model's performance: it improved the model's fairness under all the scenarios while decreased the model's performance under only 26.67% of the scenarios. Additionally, FITNESS surpasses the Fairea Baseline in 96.72% cases, outperforming all methods we compared.


NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis based on Frequency Modulation

arXiv.org Artificial Intelligence

Developing digital sound synthesizers is crucial to the music industry as it provides a low-cost way to produce high-quality sounds with rich timbres. Existing traditional synthesizers often require substantial expertise to determine the overall framework of a synthesizer and the parameters of submodules. Since expert knowledge is hard to acquire, it hinders the flexibility to quickly design and tune digital synthesizers for diverse sounds. In this paper, we propose ``NAS-FM'', which adopts neural architecture search (NAS) to build a differentiable frequency modulation (FM) synthesizer. Tunable synthesizers with interpretable controls can be developed automatically from sounds without any prior expert knowledge and manual operating costs. In detail, we train a supernet with a specifically designed search space, including predicting the envelopes of carriers and modulators with different frequency ratios. An evolutionary search algorithm with adaptive oscillator size is then developed to find the optimal relationship between oscillators and the frequency ratio of FM. Extensive experiments on recordings of different instrument sounds show that our algorithm can build a synthesizer fully automatically, achieving better results than handcrafted synthesizers. Audio samples are available at https://nas-fm.github.io/.


Towards Robust and Accurate Myoelectric Controller Design based on Multi-objective Optimization using Evolutionary Computation

arXiv.org Artificial Intelligence

Myoelectric pattern recognition is one of the important aspects in the design of the control strategy for various applications including upper-limb prostheses and bio-robotic hand movement systems. The current work has proposed an approach to design an energy-efficient EMG-based controller by considering a kernelized SVM classifier for decoding the information of surface electromyography (sEMG) signals to infer the underlying muscle movements. In order to achieve the optimized performance of the EMG-based controller, our main strategy of classifier design is to reduce the false movements of the overall system (when the EMG-based controller is at the `Rest' position). To this end, we have formulated the training algorithm of the proposed supervised learning system as a general constrained multi-objective optimization problem. An elitist multi-objective evolutionary algorithm $-$ the non-dominated sorting genetic algorithm II (NSGA-II) has been used to tune the hyperparameters of SVM. We have presented the experimental results by performing the experiments on a dataset consisting of the sEMG signals collected from eleven subjects at five different upper limb positions. Furthermore, the performance of the trained models based on the two-objective metrics, namely classification accuracy, and false-negative have been evaluated on two different test sets to examine the generalization capability of the proposed training approach while implementing limb-position invariant EMG classification. It is evident from the presented result that the proposed approach provides much more flexibility to the designer in selecting the parameters of the classifier to optimize the energy efficiency of the EMG-based controller.


Discovering Causal Relations and Equations from Data

arXiv.org Artificial Intelligence

Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing the world and, when possible, performing interventional studies in the system under study. With the advent of big data and the use of data-driven methods, causal and equation discovery fields have grown and made progress in computer science, physics, statistics, philosophy, and many applied fields. All these domains are intertwined and can be used to discover causal relations, physical laws, and equations from observational data. This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics and outlines the most important challenges and promising future lines of research. We also provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies in Earth and climate sciences, fluid dynamics and mechanics, and the neurosciences. This review demonstrates that discovering fundamental laws and causal relations by observing natural phenomena is being revolutionised with the efficient exploitation of observational data, modern machine learning algorithms and the interaction with domain knowledge. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems.


EvoTorch: Scalable Evolutionary Computation in Python

arXiv.org Artificial Intelligence

Evolutionary computation is an important component within various fields such as artificial intelligence research, reinforcement learning, robotics, industrial automation and/or optimization, engineering design, etc. Considering the increasing computational demands and the dimensionalities of modern optimization problems, the requirement for scalable, re-usable, and practical evolutionary algorithm implementations has been growing. To address this requirement, we present EvoTorch: an evolutionary computation library designed to work with high-dimensional optimization problems, with GPU support and with high parallelization capabilities. EvoTorch is based on and seamlessly works with the PyTorch library, and therefore, allows the users to define their optimization problems using a well-known API.