Evolutionary Systems
Simulation-Driven Balancing of Competitive Game Levels with Reinforcement Learning
Rupp, Florian, Eberhardinger, Manuel, Eckert, Kai
The balancing process for game levels in competitive two-player contexts involves a lot of manual work and testing, particularly for non-symmetrical game levels. In this work, we frame game balancing as a procedural content generation task and propose an architecture for automatically balancing of tile-based levels within the PCGRL framework (procedural content generation via reinforcement learning). Our architecture is divided into three parts: (1) a level generator, (2) a balancing agent, and (3) a reward modeling simulation. Through repeated simulations, the balancing agent receives rewards for adjusting the level towards a given balancing objective, such as equal win rates for all players. To this end, we propose new swap-based representations to improve the robustness of playability, thereby enabling agents to balance game levels more effectively and quickly compared to traditional PCGRL. By analyzing the agent's swapping behavior, we can infer which tile types have the most impact on the balance. We validate our approach in the Neural MMO (NMMO) environment in a competitive two-player scenario. In this extended conference paper, we present improved results, explore the applicability of the method to various forms of balancing beyond equal balancing, compare the performance to another search-based approach, and discuss the application of existing fairness metrics to game balancing.
Parental Guidance: Efficient Lifelong Learning through Evolutionary Distillation
Zhang, Octi, Peng, Quanquan, Scalise, Rosario, Boots, Bryon
Developing robotic agents that can generalize across diverse environments while continually evolving their behaviors is a core challenge in AI and robotics. The difficulties lie in solving increasingly complex tasks and ensuring agents can continue learning without converging on narrow, specialized solutions. Quality Diversity (QD) [1, 2] methods effectively foster diversity but often rely on trial and error, where the path to a final solution can be convoluted, leading to inefficiencies and uncertainty. Our approach draws inspiration from nature's inheritance process, where offspring not only receive but also build upon the knowledge of their predecessors. Similarly, our agents inherit distilled behaviors from previous generations, allowing them to adapt and continue learning efficiently, eventually surpassing their predecessors. This natural knowledge transfer reduces randomness, guiding exploration toward more meaningful learning without manual intervention like reward shaping or task descriptors. What sets our method apart is that it offers a straightforward, evolution-inspired way to consolidate and progress, avoiding the need for manually defined styles or gradient editing [3, 4] to prevent forgetting. The agent's ability to retain and refine skills is driven by a blend of IL and RL, naturally passing down essential behaviors while implicitly discarding inferior ones. We introduce Parental Guidance (PG-1) which makes the following contributions: 1. Distributed Evolution Framework: We propose a framework that distributes the evolution process across multiple compute instances, efficiently scheduling and analyzing evolution.
Surrogate Learning in Meta-Black-Box Optimization: A Preliminary Study
Ma, Zeyuan, Huang, Zhiyang, Chen, Jiacheng, Cao, Zhiguang, Gong, Yue-Jiao
Recent Meta-Black-Box Optimization (MetaBBO) approaches have shown possibility of enhancing the optimization performance through learning meta-level policies to dynamically configure low-level optimizers. However, existing MetaBBO approaches potentially consume massive function evaluations to train their meta-level policies. Inspired by the recent trend of using surrogate models for cost-friendly evaluation of expensive optimization problems, in this paper, we propose a novel MetaBBO framework which combines surrogate learning process and reinforcement learning-aided Differential Evolution algorithm, namely Surr-RLDE, to address the intensive function evaluation in MetaBBO. Surr-RLDE comprises two learning stages: surrogate learning and policy learning. In surrogate learning, we train a Kolmogorov-Arnold Networks (KAN) with a novel relative-order-aware loss to accurately approximate the objective functions of the problem instances used for subsequent policy learning. In policy learning, we employ reinforcement learning (RL) to dynamically configure the mutation operator in DE. The learned surrogate model is integrated into the training of the RL-based policy to substitute for the original objective function, which effectively reduces consumed evaluations during policy learning. Extensive benchmark results demonstrate that Surr-RLDE not only shows competitive performance to recent baselines, but also shows compelling generalization for higher-dimensional problems. Further ablation studies underscore the effectiveness of each technical components in Surr-RLDE. We open-source Surr-RLDE at https://github.com/GMC-DRL/Surr-RLDE.
Lifelong Evolution of Swarms
Leuzzi, Lorenzo, Jones, Simon, Hauert, Sabine, Bacciu, Davide, Cossu, Andrea
Adapting to task changes without forgetting previous knowledge is a key skill for intelligent systems, and a crucial aspect of lifelong learning. Swarm controllers, however, are typically designed for specific tasks, lacking the ability to retain knowledge across changing tasks. Lifelong learning, on the other hand, focuses on individual agents with limited insights into the emergent abilities of a collective like a swarm. To address this gap, we introduce a lifelong evolutionary framework for swarms, where a population of swarm controllers is evolved in a dynamic environment that incrementally presents novel tasks. This requires evolution to find controllers that quickly adapt to new tasks while retaining knowledge of previous ones, as they may reappear in the future. We discover that the population inherently preserves information about previous tasks, and it can reuse it to foster adaptation and mitigate forgetting. In contrast, the top-performing individual for a given task catastrophically forgets previous tasks. To mitigate this phenomenon, we design a regularization process for the evolutionary algorithm, reducing forgetting in top-performing individuals. Evolving swarms in a lifelong fashion raises fundamental questions on the current state of deep lifelong learning and on the robustness of swarm controllers in dynamic environments.
Offline Model-Based Optimization: Comprehensive Review
Kim, Minsu, Gu, Jiayao, Yuan, Ye, Yun, Taeyoung, Liu, Zixuan, Bengio, Yoshua, Chen, Can
Offline optimization is a fundamental challenge in science and engineering, where the goal is to optimize black-box functions using only offline datasets. This setting is particularly relevant when querying the objective function is prohibitively expensive or infeasible, with applications spanning protein engineering, material discovery, neural architecture search, and beyond. The main difficulty lies in accurately estimating the objective landscape beyond the available data, where extrapolations are fraught with significant epistemic uncertainty. This uncertainty can lead to objective hacking(reward hacking), exploiting model inaccuracies in unseen regions, or other spurious optimizations that yield misleadingly high performance estimates outside the training distribution. Recent advances in model-based optimization(MBO) have harnessed the generalization capabilities of deep neural networks to develop offline-specific surrogate and generative models. Trained with carefully designed strategies, these models are more robust against out-of-distribution issues, facilitating the discovery of improved designs. Despite its growing impact in accelerating scientific discovery, the field lacks a comprehensive review. To bridge this gap, we present the first thorough review of offline MBO. We begin by formalizing the problem for both single-objective and multi-objective settings and by reviewing recent benchmarks and evaluation metrics. We then categorize existing approaches into two key areas: surrogate modeling, which emphasizes accurate function approximation in out-of-distribution regions, and generative modeling, which explores high-dimensional design spaces to identify high-performing designs. Finally, we examine the key challenges and propose promising directions for advancement in this rapidly evolving field including safe control of superintelligent systems.
An Approach to Analyze Niche Evolution in XCS Models
We present an approach to identify and track the evolution of niches in XCS that can be applied to any XCS model and any problem. It exploits the underlying principles of the evolutionary component of XCS, and therefore, it is independent of the representation used. It also employs information already available in XCS and thus requires minimal modifications to an existing XCS implementation. We present experiments on binary single-step and multi-step problems involving non-overlapping and highly overlapping solutions. We show that our approach can identify and evaluate the number of niches in the population; it also show that it can be used to identify the composition of active niches to as to track their evolution over time, allowing for a more in-depth analysis of XCS behavior.
A Study on Human-Swarm Interaction: A Framework for Assessing Situation Awareness and Task Performance
Wattearachchi, Wasura D., Lakshika, Erandi, Kasmarik, Kathryn, Barlow, Michael
This paper introduces a framework for human swarm interaction studies that measures situation awareness in dynamic environments. A tablet-based interface was developed for a user study by implementing the concepts introduced in the framework, where operators guided a robotic swarm in a single-target search task, marking hazardous cells unknown to the swarm. Both subjective and objective situation awareness measures were used, with task performance evaluated based on how close the robots were to the target. The framework enabled a structured investigation of the role of situation awareness in human swarm interaction, leading to key findings such as improved task performance across attempts, showing the interface was learnable, centroid active robot position proved to be a useful task performance metric for assessing situation awareness, perception and projection played a key role in task performance, highlighting their importance in interface design and both subjective and objective situation awareness influenced task performance, emphasizing the need for interfaces that support both. These findings validate our framework as a structured approach for integrating situation awareness concepts into human swarm interaction studies, offering a systematic way to assess situation awareness and task performance. The framework can be applied to other swarming studies to evaluate interface learnability, identify meaningful task performance metrics, and refine interface designs to enhance situation awareness, ultimately improving human swarm interaction in dynamic environments.
HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard
Dong, Yifei, Wu, Fengyi, He, Qi, Li, Heng, Li, Minghan, Cheng, Zebang, Zhou, Yuxuan, Sun, Jingdong, Dai, Qi, Cheng, Zhi-Qi, Hauptmann, Alexander G
Vision-and-Language Navigation (VLN) systems often focus on either discrete (panoramic) or continuous (free-motion) paradigms alone, overlooking the complexities of human-populated, dynamic environments. We introduce a unified Human-Aware VLN (HA-VLN) benchmark that merges these paradigms under explicit social-awareness constraints. Our contributions include: 1. A standardized task definition that balances discrete-continuous navigation with personal-space requirements; 2. An enhanced human motion dataset (HAPS 2.0) and upgraded simulators capturing realistic multi-human interactions, outdoor contexts, and refined motion-language alignment; 3. Extensive benchmarking on 16,844 human-centric instructions, revealing how multi-human dynamics and partial observability pose substantial challenges for leading VLN agents; 4. Real-world robot tests validating sim-to-real transfer in crowded indoor spaces; and 5. A public leaderboard supporting transparent comparisons across discrete and continuous tasks. Empirical results show improved navigation success and fewer collisions when social context is integrated, underscoring the need for human-centric design. By releasing all datasets, simulators, agent code, and evaluation tools, we aim to advance safer, more capable, and socially responsible VLN research.
Island-Based Evolutionary Computation with Diverse Surrogates and Adaptive Knowledge Transfer for High-Dimensional Data-Driven Optimization
Zhang, Xian-Rong, Gong, Yue-Jiao, Cao, Zhiguang, Zhang, Jun
In recent years, there has been a growing interest in data-driven evolutionary algorithms (DDEAs) employing surrogate models to approximate the objective functions with limited data. However, current DDEAs are primarily designed for lower-dimensional problems and their performance drops significantly when applied to large-scale optimization problems (LSOPs). To address the challenge, this paper proposes an offline DDEA named DSKT-DDEA. DSKT-DDEA leverages multiple islands that utilize different data to establish diverse surrogate models, fostering diverse subpopulations and mitigating the risk of premature convergence. In the intra-island optimization phase, a semi-supervised learning method is devised to fine-tune the surrogates. It not only facilitates data argumentation, but also incorporates the distribution information gathered during the search process to align the surrogates with the evolving local landscapes. Then, in the inter-island knowledge transfer phase, the algorithm incorporates an adaptive strategy that periodically transfers individual information and evaluates the transfer effectiveness in the new environment, facilitating global optimization efficacy. Experimental results demonstrate that our algorithm is competitive with state-of-the-art DDEAs on problems with up to 1000 dimensions, while also exhibiting decent parallelism and scalability. Our DSKT-DDEA is open-source and accessible at: https://github.com/LabGong/DSKT-DDEA.
Transformable Modular Robots: A CPG-Based Approach to Independent and Collective Locomotion
Ding, Jiayu, Jakkula, Rohit, Xiao, Tom, Gan, Zhenyu
Modular robotics enables the development of versatile and adaptive robotic systems with autonomous reconfiguration. This paper presents a modular robotic system in which each module has independent actuation, battery power, and control, allowing both individual mobility and coordinated locomotion. A hierarchical Central Pattern Generator (CPG) framework governs motion, with a low-level CPG controlling individual modules and a high-level CPG synchronizing inter-module coordination, enabling smooth transitions between independent and collective behaviors. To validate the system, we conduct simulations in MuJoCo and hardware experiments, evaluating locomotion across different configurations. We first analyze single-module motion, followed by two-module cooperative locomotion. Results demonstrate the effectiveness of the CPG-based control framework in achieving robust, flexible, and scalable locomotion. The proposed modular architecture has potential applications in search and rescue, environmental monitoring, and autonomous exploration, where adaptability and reconfigurability are essential.