Ma, Yining
Real-world Troublemaker: A 5G Cloud-controlled Track Testing Framework for Automated Driving Systems in Safety-critical Interaction Scenarios
Zhang, Xinrui, Xiong, Lu, Zhang, Peizhi, Huang, Junpeng, Ma, Yining
--Track testing plays a critical role in the safety evaluation of autonomous driving systems (ADS), as it provides a real-world interaction environment. However, the inflexibility in motion control of object targets and the absence of intelligent interactive testing methods often result in pre-fixed and limited testing scenarios. T o address these limitations, we propose a novel 5G cloud-controlled track testing framework, Real-world Troublemaker . This framework overcomes the rigidity of traditional pre-programmed control by leveraging 5G cloud-controlled object targets integrated with the Internet of Things (IoT) and vehicle teleoperation technologies. Unlike conventional testing methods that rely on pre-set conditions, we propose a dynamic game strategy based on a quadratic risk interaction utility function, facilitating intelligent interactions with the vehicle under test (VUT) and creating a more realistic and dynamic interaction environment. The proposed framework has been successfully implemented at the T ongji University Intelligent Connected V ehicle Evaluation Base. Field test results demonstrate that Troublemaker can perform dynamic interactive testing of ADS accurately and effectively. Compared to traditional methods, Troublemaker improves scenario reproduction accuracy by 65.2%, increases the diversity of interaction strategies by approximately 9.2 times, and enhances exposure frequency of safety-critical scenarios by 3.5 times in unprotected left-turn scenarios. Index T erms --Automated driving systems, track testing, 5G, cloud-controlled object targets, interaction scenarios. HE safety of automated driving systems (ADS) must be ensured prior to their practical implementation, which requires a well-established testing framework [1]. Existing test standards, such as ISO 26262 [2], UN R157 [3], and UN R171 [4], are not sufficient to comprehensively evaluate ADS. According to the driving automation levels defined by SAE J3016 from SAE International, a high-level ADS (i.e., Level 3 or higher) is expected to perform driving tasks independently and autonomously, with the driver no longer retaining continuous control over vehicle movement [5]. While ADS has already been deployed in various countries and regions, numerous ADS traffic incidents highlight that safety testing for high-level ADS remains a critical technical challenge. In comparison to traditional vehicles and advanced driver assistance systems (ADAS), high-level ADS testing faces significant transformations and challenges, particularly in terms of both test subjects and requirements.
Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling
Li, Sirui, Ouyang, Wenbin, Ma, Yining, Wu, Cathy
Furthermore, when evaluating the performance on 600 operations FJSP (10, 20, 30) in Table 1, we see that option (1) and (2), results in a longer solve time but an improved makespan from the architecture without attention. We also note that option (3) is strictly dominated by the performance of the architecture without attention. We note that the TNR-TPR tradeoff on the performance and solve time aligns with our theoretical analysis, as fixing something that should not have been (low TNR) harms the objective but helps the solve time, while failing to fix something that should have been (low TPR) harms the solve time and also indirectly harms the objective (under a fixed time limit). Due to the time benefit of the architecture without attention and the relatively competitive objective, we believe it makes sense to keep the simpler architecture without attention in the main paper.Figure 7: Ablation neural architecture: Attention among the overlapping and new operations. The architecture follows Figure 1, but introduces an additional cross attention among the overlapping and new operations before output the predicted probability for each overlapping operation.
Diversity Optimization for Travelling Salesman Problem via Deep Reinforcement Learning
Li, Qi, Cao, Zhiguang, Ma, Yining, Wu, Yaoxin, Gong, Yue-Jiao
As a practical and crucial supplement to the classic TSP, it is Existing neural methods for the Travelling Salesman Problem (TSP) highly desired in many real-world scenarios, where a single solution mostly aim at finding a single optimal solution. To discover diverse may be insufficient. For example, 1) when the single target route yet high-quality solutions for Multi-Solution TSP (MSTSP), we propose (solution) becomes unavailable due to unexpected circumstances, a novel deep reinforcement learning based neural solver, which MSTSP offers desirable alternatives; 2) while the single target route is primarily featured by an encoder-decoder structured policy. Concretely, may overlook other important metrics like user preferences, MSTSP on the one hand, a Relativization Filter (RF) is designed to allows for personalized choices among a set of high-quality candidate enhance the robustness of the encoder to affine transformations of routes; 3) while the single target route may incur spontaneous the instances, so as to potentially improve the quality of the found and simultaneous pursuit of the same choice, MSTSP can distribute solutions. On the other hand, a Multi-Attentive Adaptive Active users or loads across different routes, potentially mitigating the jam Search (MA3S) is tailored to allow the decoders to strike a balance and enhancing the overall performance.
ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning
Guo, Hongshu, Ma, Zeyuan, Chen, Jiacheng, Ma, Yining, Cao, Zhiguang, Zhang, Xinglin, Gong, Yue-Jiao
Recent advances in Meta-learning for Black-Box Optimization (MetaBBO) have shown the potential of using neural networks to dynamically configure evolutionary algorithms (EAs), enhancing their performance and adaptability across various BBO instances. However, they are often tailored to a specific EA, which limits their generalizability and necessitates retraining or redesigns for different EAs and optimization problems. To address this limitation, we introduce ConfigX, a new paradigm of the MetaBBO framework that is capable of learning a universal configuration agent (model) for boosting diverse EAs. To achieve so, our ConfigX first leverages a novel modularization system that enables the flexible combination of various optimization sub-modules to generate diverse EAs during training. Additionally, we propose a Transformer-based neural network to meta-learn a universal configuration policy through multitask reinforcement learning across a designed joint optimization task space. Extensive experiments verify that, our ConfigX, after large-scale pre-training, achieves robust zero-shot generalization to unseen tasks and outperforms state-of-the-art baselines. Moreover, ConfigX exhibits strong lifelong learning capabilities, allowing efficient adaptation to new tasks through fine-tuning. Our proposed ConfigX represents a significant step toward an automatic, all-purpose configuration agent for EAs.
MA-DV2F: A Multi-Agent Navigation Framework using Dynamic Velocity Vector Field
Ma, Yining, Khan, Qadeer, Cremers, Daniel
In this paper we propose MA-DV2F: Multi-Agent Dynamic Velocity Vector Field. It is a framework for simultaneously controlling a group of vehicles in challenging environments. DV2F is generated for each vehicle independently and provides a map of reference orientation and speed that a vehicle must attain at any point on the navigation grid such that it safely reaches its target. The field is dynamically updated depending on the speed and proximity of the ego-vehicle to other agents. This dynamic adaptation of the velocity vector field allows prevention of imminent collisions. Experimental results show that MA-DV2F outperforms concurrent methods in terms of safety, computational efficiency and accuracy in reaching the target when scaling to a large number of vehicles. Project page for this work can be found here: https://yininghase.github.io/MA-DV2F/
Learning to Handle Complex Constraints for Vehicle Routing Problems
Bi, Jieyi, Ma, Yining, Zhou, Jianan, Song, Wen, Cao, Zhiguang, Wu, Yaoxin, Zhang, Jie
Vehicle Routing Problems (VRPs) can model many real-world scenarios and often involve complex constraints. While recent neural methods excel in constructing solutions based on feasibility masking, they struggle with handling complex constraints, especially when obtaining the masking itself is NP-hard. In this paper, we propose a novel Proactive Infeasibility Prevention (PIP) framework to advance the capabilities of neural methods towards more complex VRPs. Our PIP integrates the Lagrangian multiplier as a basis to enhance constraint awareness and introduces preventative infeasibility masking to proactively steer the solution construction process. Moreover, we present PIP-D, which employs an auxiliary decoder and two adaptive strategies to learn and predict these tailored masks, potentially enhancing performance while significantly reducing computational costs during training. To verify our PIP designs, we conduct extensive experiments on the highly challenging Traveling Salesman Problem with Time Window (TSPTW), and TSP with Draft Limit (TSPDL) variants under different constraint hardness levels. Notably, our PIP is generic to boost many neural methods, and exhibits both a significant reduction in infeasible rate and a substantial improvement in solution quality.
Hierarchical Neural Constructive Solver for Real-world TSP Scenarios
Goh, Yong Liang, Cao, Zhiguang, Ma, Yining, Dong, Yanfei, Dupty, Mohammed Haroon, Lee, Wee Sun
Existing neural constructive solvers for routing problems have predominantly employed transformer architectures, conceptualizing the route construction as a set-to-sequence learning task. However, their efficacy has primarily been demonstrated on entirely random problem instances that inadequately capture real-world scenarios. In this paper, we introduce realistic Traveling Salesman Problem (TSP) scenarios relevant to industrial settings and derive the following insights: (1) The optimal next node (or city) to visit often lies within proximity to the current node, suggesting the potential benefits of biasing choices based on current locations. (2) Effectively solving the TSP requires robust tracking of unvisited nodes and warrants succinct grouping strategies. Building upon these insights, we propose integrating a learnable choice layer inspired by Hypernetworks to prioritize choices based on the current location, and a learnable approximate clustering algorithm inspired by the Expectation-Maximization algorithm to facilitate grouping the unvisited cities. Together, these two contributions form a hierarchical approach towards solving the realistic TSP by considering both immediate local neighbourhoods and learning an intermediate set of node representations. Our hierarchical approach yields superior performance compared to both classical and recent transformer models, showcasing the efficacy of the key designs.
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
Zhou, Jianan, Cao, Zhiguang, Wu, Yaoxin, Song, Wen, Ma, Yining, Zhang, Jie, Xu, Chi
Learning to solve vehicle routing problems (VRPs) has garnered much attention. However, most neural solvers are only structured and trained independently on a specific problem, making them less generic and practical. In this paper, we aim to develop a unified neural solver that can cope with a range of VRP variants simultaneously. Specifically, we propose a multi-task vehicle routing solver with mixture-of-experts (MVMoE), which greatly enhances the model capacity without a proportional increase in computation. We further develop a hierarchical gating mechanism for the MVMoE, delivering a good trade-off between empirical performance and computational complexity. Experimentally, our method significantly promotes zero-shot generalization performance on 10 unseen VRP variants, and showcases decent results on the few-shot setting and real-world benchmark instances. We further conduct extensive studies on the effect of MoE configurations in solving VRPs, and observe the superiority of hierarchical gating when facing out-of-distribution data. The source code is available at: https://github.com/RoyalSkye/Routing-MVMoE.
Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning
Ma, Zeyuan, Chen, Jiacheng, Guo, Hongshu, Ma, Yining, Gong, Yue-Jiao
Evolutionary computation (EC) algorithms, renowned as powerful black-box optimizers, leverage a group of individuals to cooperatively search for the optimum. The exploration-exploitation tradeoff (EET) plays a crucial role in EC, which, however, has traditionally been governed by manually designed rules. In this paper, we propose a deep reinforcement learning-based framework that autonomously configures and adapts the EET throughout the EC search process. The framework allows different individuals of the population to selectively attend to the global and local exemplars based on the current search state, maximizing the cooperative search outcome. Our proposed framework is characterized by its simplicity, effectiveness, and generalizability, with the potential to enhance numerous existing EC algorithms. To validate its capabilities, we apply our framework to several representative EC algorithms and conduct extensive experiments on the augmented CEC2021 benchmark. The results demonstrate significant improvements in the performance of the backbone algorithms, as well as favorable generalization across diverse problem classes, dimensions, and population sizes. Additionally, we provide an in-depth analysis of the EET issue by interpreting the learned behaviors of EC.
Deep Reinforcement Learning for Dynamic Algorithm Selection: A Proof-of-Principle Study on Differential Evolution
Guo, Hongshu, Ma, Yining, Ma, Zeyuan, Chen, Jiacheng, Zhang, Xinglin, Cao, Zhiguang, Zhang, Jun, Gong, Yue-Jiao
Evolutionary algorithms, such as Differential Evolution, excel in solving real-parameter optimization challenges. However, the effectiveness of a single algorithm varies across different problem instances, necessitating considerable efforts in algorithm selection or configuration. This paper aims to address the limitation by leveraging the complementary strengths of a group of algorithms and dynamically scheduling them throughout the optimization progress for specific problems. We propose a deep reinforcement learning-based dynamic algorithm selection framework to accomplish this task. Our approach models the dynamic algorithm selection a Markov Decision Process, training an agent in a policy gradient manner to select the most suitable algorithm according to the features observed during the optimization process. To empower the agent with the necessary information, our framework incorporates a thoughtful design of landscape and algorithmic features. Meanwhile, we employ a sophisticated deep neural network model to infer the optimal action, ensuring informed algorithm selections. Additionally, an algorithm context restoration mechanism is embedded to facilitate smooth switching among different algorithms. These mechanisms together enable our framework to seamlessly select and switch algorithms in a dynamic online fashion. Notably, the proposed framework is simple and generic, offering potential improvements across a broad spectrum of evolutionary algorithms. As a proof-of-principle study, we apply this framework to a group of Differential Evolution algorithms. The experimental results showcase the remarkable effectiveness of the proposed framework, not only enhancing the overall optimization performance but also demonstrating favorable generalization ability across different problem classes.