Goto

Collaborating Authors

 Evolutionary Systems


A projection-based framework for gradient-free and parallel learning

arXiv.org Artificial Intelligence

We present a feasibility-seeking approach to neural network training. This mathematical optimization framework is distinct from conventional gradient-based loss minimization and uses projection operators and iterative projection algorithms. We reformulate training as a large-scale feasibility problem: finding network parameters and states that satisfy local constraints derived from its elementary operations. Training then involves projecting onto these constraints, a local operation that can be parallelized across the network. We introduce PJAX, a JAX-based software framework that enables this paradigm. PJAX composes projection operators for elementary operations, automatically deriving the solution operators for the feasibility problems (akin to autodiff for derivatives). It inherently supports GPU/TPU acceleration, provides a familiar NumPy-like API, and is extensible. We train diverse architectures (MLPs, CNNs, RNNs) on standard benchmarks using PJAX, demonstrating its functionality and generality. Our results show that this approach is as a compelling alternative to gradient-based training, with clear advantages in parallelism and the ability to handle non-differentiable operations.


Stealix: Model Stealing via Prompt Evolution

arXiv.org Artificial Intelligence

Model stealing poses a significant security risk in machine learning by enabling attackers to replicate a black-box model without access to its training data, thus jeopardizing intellectual property and exposing sensitive information. Recent methods that use pre-trained diffusion models for data synthesis improve efficiency and performance but rely heavily on manually crafted prompts, limiting automation and scalability, especially for attackers with little expertise. To assess the risks posed by open-source pre-trained models, we propose a more realistic threat model that eliminates the need for prompt design skills or knowledge of class names. In this context, we introduce Stealix, the first approach to perform model stealing without predefined prompts. Stealix uses two open-source pre-trained models to infer the victim model's data distribution, and iteratively refines prompts through a genetic algorithm, progressively improving the precision and diversity of synthetic images. Our experimental results demonstrate that Stealix significantly outperforms other methods, even those with access to class names or fine-grained prompts, while operating under the same query budget. These findings highlight the scalability of our approach and suggest that the risks posed by pre-trained generative models in model stealing may be greater than previously recognized.


Data Swarms: Optimizable Generation of Synthetic Evaluation Data

arXiv.org Artificial Intelligence

We propose Data Swarms, an algorithm to optimize the generation of synthetic evaluation data and advance quantitative desiderata of LLM evaluation. We first train a swarm of initial data generators using existing data, and define various evaluation objectives to reflect the desired properties of evaluation (e.g., generate more difficult problems for the evaluated models) and quantitatively evaluate data generators. We then employ particle swarm optimization to optimize the swarm of data generators, where they collaboratively search through the model parameter space to find new generators that advance these objectives. We further extend it to Adversarial Swarms, where the data generator swarm generates harder data while the test taker model swarm learns from such data, co-evolving dynamically for better data and models simultaneously. Extensive experiments demonstrate that Data Swarms outperforms eight data generation baselines across five evaluation objectives, while Adversarial Swarms produce more robust learning of synthetic data and stronger generalization. Further analysis reveals that Data Swarms successfully optimizes compositions of multiple evaluation objectives and generalizes to new off-the-shelf LLMs, unseen at optimization time.


Decoupling Representation and Learning in Genetic Programming: the LaSER Approach

arXiv.org Artificial Intelligence

Genetic Programming (GP) has traditionally entangled the evolution of symbolic representations with their performance-based evaluation, often relying solely on raw fitness scores. This tight coupling makes GP solutions more fragile and prone to overfitting, reducing their ability to generalize. In this work, we propose LaSER (Latent Semantic Representation Regression)} -- a general framework that decouples representation evolution from lifetime learning. At each generation, candidate programs produce features which are passed to an external learner to model the target task. This approach enables any function approximator, from linear models to neural networks, to serve as a lifetime learner, allowing expressive modeling beyond conventional symbolic forms. Here we show for the first time that LaSER can outcompete standard GP and GP followed by linear regression when it employs non-linear methods to fit coefficients to GP-generated equations against complex data sets. Further, we explore how LaSER enables the emergence of innate representations, supporting long-standing hypotheses in evolutionary learning such as the Baldwin Effect. By separating the roles of representation and adaptation, LaSER offers a principled and extensible framework for symbolic regression and classification.


A PID-Controlled Tensor Wheel Decomposition Model for Dynamic Link Prediction

arXiv.org Artificial Intelligence

Link prediction in dynamic networks remains a fundamental challenge in network science, requiring the inference of potential interactions and their evolving strengths through spatiotemporal pattern analysis. Traditional static network methods have inherent limitations in capturing temporal dependencies and weight dynamics, while tensor - based methods offer a promising paradigm by encoding dynamic networks into high - order tensors to explicitly model multidimensional interactions across nodes and time. Among them, tensor wheel decomposition (TWD) stands out for its innovative topological structure, which decomposes high - order tensors into cyclic factors and core tensors to maintain structural integrity. To improve the prediction accuracy, this study introduces a PID - controll ed tensor wheel decomposition (PTWD) model, which mainly adopts the following two ideas: 1) exploiting the representation power of TWD to capture the latent features of d ynamic network topology and weight evolution, and 2) integrating the proportional - integral - derivative (PID) control principle into the optimization process to obtain a stable model parameter learning scheme. The performance on four real datasets verifies that the proposed PTWD model has more accurate link prediction capabilities compared to other models.


Non-linear Multi-objective Optimization with Probabilistic Branch and Bound

arXiv.org Artificial Intelligence

MOPBnB(so) evaluates a noisy function exactly once at any solution and uses neighboring solutions to estimate the objective functions, in contrast to a variant that uses multiple replications at a solution to estimate the objective functions. A finite-time performance analysis for deterministic multi-objective problems provides a bound on the probability that MOPBnB(so) captures the Pareto optimal set. Asymptotic convergence of MOPBnB(so) on stochastic problems is derived, in that the algorithm captures the Pareto optimal set and the estimations converge to the true objective function values. Numerical results reveal that the variant with multiple replications is extremely intensive in terms of computational resources compared to MOPBnB(so). In addition, numerical results show that MOPBnB(so) outperforms a genetic algorithm NSGA-II on test problems. Keywords: global optimization; multiple objectives; branch and bound; stochastic optimization; estimation 1 Introduction Multiple objectives generally exist for practical problems, and providing solutions to multi-objective problems is more challenging than for single objective problems (Miettinen, 2012).


You Only Train Once

arXiv.org Artificial Intelligence

The title of this paper is perhaps an overclaim. Of course, the process of creating and optimizing a learned model inevitably involves multiple training runs which potentially feature different architectural designs, input and output encodings, and losses. However, our method, You Only Train Once (YOTO), indeed contributes to limiting training to one shot for the latter aspect of losses selection and weighting. We achieve this by automatically optimizing loss weight hyperparameters of learned models in one shot via standard gradient-based optimization, treating these hyperparameters as regular parameters of the networks and learning them. To this end, we leverage the differentiability of the composite loss formulation which is widely used for optimizing multiple empirical losses simultaneously and model it as a novel layer which is parameterized with a softmax operation that satisfies the inherent positivity constraints on loss hyperparameters while avoiding degenerate empirical gradients. We complete our joint end-to-end optimization scheme by defining a novel regularization loss on the learned hyperparameters, which models a uniformity prior among the employed losses while ensuring boundedness of the identified optima. We evidence the efficacy of YOTO in jointly optimizing loss hyperparameters and regular model parameters in one shot by comparing it to the commonly used brute-force grid search across state-of-the-art networks solving two key problems in computer vision, i.e. 3D estimation and semantic segmentation, and showing that it consistently outperforms the best grid-search model on unseen test data. Code will be made publicly available.


Dynamic real-time multi-UAV cooperative mission planning method under multiple constraints

arXiv.org Artificial Intelligence

As UAV popularity soars, so does the mission planning associated with it. The classical approaches suffer from the triple problems of decoupled of task assignment and path planning, poor real-time performance and limited adaptability. Aiming at these challenges, this paper proposes a dynamic real-time multi-UAV collaborative mission planning algorithm based on Dubins paths under a distributed formation structure. Dubins path with multiple advantages bridges the gap between task assignment and path planning, leading to a coupled solution for mission planning. Then, a series of acceleration techniques, task clustering preprocessing, highly efficient distance cost functions, low-complexity and less iterative task allocation strategies, are employed to guarantee the real-time performance of the algorithms. To cope with different emergencies and their simultaneous extremes, real-time planning of emerging tasks and mission replanning due to the reduction of available UAVs are appropriately handled. Finally, the developed algorithm is comprehensively exemplified and studied through simulations, highlighting that the proposed method only sacrifices 9.57% of the path length, while achieving a speed improvement of 4-5 orders of magnitude over the simulated annealing method, with a single mission planning of about 0.0003s.


SiamNAS: Siamese Surrogate Model for Dominance Relation Prediction in Multi-objective Neural Architecture Search

arXiv.org Artificial Intelligence

Modern neural architecture search (NAS) is inherently multi-objective, balancing trade-offs such as accuracy, parameter count, and computational cost. This complexity makes NAS computationally expensive and nearly impossible to solve without efficient approximations. To address this, we propose a novel surrogate modelling approach that leverages an ensemble of Siamese network blocks to predict dominance relationships between candidate architectures. Lightweight and easy to train, the surrogate achieves 92% accuracy and replaces the crowding distance calculation in the survivor selection strategy with a heuristic rule based on model size. Integrated into a framework termed SiamNAS, this design eliminates costly evaluations during the search process. Experiments on NAS-Bench-201 demonstrate the framework's ability to identify Pareto-optimal solutions with significantly reduced computational costs. The proposed SiamNAS identified a final non-dominated set containing the best architecture in NAS-Bench-201 for CIFAR-10 and the second-best for ImageNet, in terms of test error rate, within 0.01 GPU days. This proof-of-concept study highlights the potential of the proposed Siamese network surrogate model to generalise to multi-tasking optimisation, enabling simultaneous optimisation across tasks. Additionally, it offers opportunities to extend the approach for generating Sets of Pareto Sets (SOS), providing diverse Pareto-optimal solutions for heterogeneous task settings.


Why Do More Experts Fail? A Theoretical Analysis of Model Merging

arXiv.org Artificial Intelligence

Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model. Although recent model merging methods have shown promising results, they struggle to maintain performance gains as the number of merged models increases. In this paper, we investigate the key obstacles that limit the scalability of model merging when integrating a large number of expert models. First, we prove that there is an upper bound on model merging. Further theoretical analysis reveals that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged. Gaussian Width shows that the marginal benefit of merging additional models diminishes according to a strictly concave function. This implies that the effective parameter space becomes rapidly saturated as the number of merged models increases. Furthermore, using Approximate Kinematics Theory, we prove the existence of a unique optimal threshold beyond which adding more models does not yield significant performance improvements. At the same time, we introduce a straightforward Reparameterized Heavy-Tailed method (RHT) to extend the coverage of the merged model, thereby enhancing its performance. Empirical results on 12 benchmarks, including both knowledge-intensive and general-purpose tasks, validate our theoretical analysis. We believe that these results spark further research beyond the current scope of model merging. The source code is in the Github repository: https://github.com/wzj1718/ModelMergingAnalysis.