Optimization
Safe-EF: Error Feedback for Nonsmooth Constrained Optimization
Islamov, Rustem, As, Yarden, Fatkhullin, Ilyas
Federated learning faces severe communication bottlenecks due to the high dimensionality of model updates. Communication compression with contractive compressors (e.g., Top-K) is often preferable in practice but can degrade performance without proper handling. Error feedback (EF) mitigates such issues but has been largely restricted for smooth, unconstrained problems, limiting its real-world applicability where non-smooth objectives and safety constraints are critical. We advance our understanding of EF in the canonical non-smooth convex setting by establishing new lower complexity bounds for first-order algorithms with contractive compression. Next, we propose Safe-EF, a novel algorithm that matches our lower bound (up to a constant) while enforcing safety constraints essential for practical applications. Extending our approach to the stochastic setting, we bridge the gap between theory and practical implementation. Extensive experiments in a reinforcement learning setup, simulating distributed humanoid robot training, validate the effectiveness of Safe-EF in ensuring safety and reducing communication complexity.
Dynamic real-time multi-UAV cooperative mission planning method under multiple constraints
Liu, Chenglou, Lu, Yufeng, Xie, Fangfang, Ji, Tingwei, Zheng, Yao
As UAV popularity soars, so does the mission planning associated with it. The classical approaches suffer from the triple problems of decoupled of task assignment and path planning, poor real-time performance and limited adaptability. Aiming at these challenges, this paper proposes a dynamic real-time multi-UAV collaborative mission planning algorithm based on Dubins paths under a distributed formation structure. Dubins path with multiple advantages bridges the gap between task assignment and path planning, leading to a coupled solution for mission planning. Then, a series of acceleration techniques, task clustering preprocessing, highly efficient distance cost functions, low-complexity and less iterative task allocation strategies, are employed to guarantee the real-time performance of the algorithms. To cope with different emergencies and their simultaneous extremes, real-time planning of emerging tasks and mission replanning due to the reduction of available UAVs are appropriately handled. Finally, the developed algorithm is comprehensively exemplified and studied through simulations, highlighting that the proposed method only sacrifices 9.57% of the path length, while achieving a speed improvement of 4-5 orders of magnitude over the simulated annealing method, with a single mission planning of about 0.0003s.
Text-guided Generation of Efficient Personalized Inspection Plans
Sun, Xingpeng, Pan, Zherong, Gao, Xifeng, Wu, Kui, Bera, Aniket
We propose a training-free, Vision-Language Model (VLM)-guided approach for efficiently generating trajectories to facilitate target inspection planning based on text descriptions. Unlike existing Vision-and-Language Navigation (VLN) methods designed for general agents in unknown environments, our approach specifically targets the efficient inspection of known scenes, with widespread applications in fields such as medical, marine, and civil engineering. Leveraging VLMs, our method first extracts points of interest (POIs) from the text description, then identifies a set of waypoints from which POIs are both salient and align with the spatial constraints defined in the prompt. Next, we interact with the VLM to iteratively refine the trajectory, preserving the visibility and prominence of the POIs. Further, we solve a Traveling Salesman Problem (TSP) to find the most efficient visitation order that satisfies the order constraint implied in the text description. Finally, we apply trajectory optimization to generate smooth, executable inspection paths for aerial and underwater vehicles. We have evaluated our method across a series of both handcrafted and real-world scanned environments. The results demonstrate that our approach effectively generates inspection planning trajectories that adhere to user instructions.
BNPO: Beta Normalization Policy Optimization
Xiao, Changyi, Zhang, Mengdi, Cao, Yixin
Recent studies, including DeepSeek-R1 and Kimi-k1.5, have demonstrated that reinforcement learning with rule-based, binary-valued reward functions can significantly enhance the reasoning capabilities of large language models. These models primarily utilize REINFORCE-based policy optimization techniques, such as REINFORCE with baseline and group relative policy optimization (GRPO). However, a key limitation remains: current policy optimization methods either neglect reward normalization or employ static normalization strategies, which fail to adapt to the dynamic nature of policy updates during training. This may result in unstable gradient estimates and hinder training stability. To address this issue, we propose Beta Normalization Policy Optimization (BNPO), a novel policy optimization method that adaptively normalizes rewards using a Beta distribution with dynamically updated parameters. BNPO aligns the normalization with the changing policy distribution, enabling more precise and lower-variance gradient estimation, which in turn promotes stable training dynamics. We provide theoretical analysis demonstrating BNPO's variance-reducing properties and show that it generalizes both REINFORCE and GRPO under binary-valued reward settings. Furthermore, we introduce an advantage decomposition mechanism to extend BNPO's applicability to more complex reward systems. Experimental results confirm that BNPO achieves state-of-the-art performance among policy optimization methods on reasoning tasks. The code is available at https://github.com/changyi7231/BNPO.
Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization
Wu, Ruilong, Li, Xinjiao, Wang, Yisu, Chen, Xinyu, Kutscher, Dirk
Hybrid parallelism techniques are essential for efficiently training large language models (LLMs). Nevertheless, current automatic parallel planning frameworks often overlook the simultaneous consideration of node heterogeneity and dynamic network topology changes, limiting their effectiveness in practical applications. In this paper, we address these limitations by modeling heterogeneous nodes within dynamically changing network environments and leveraging simulation-based strategies to determine optimal parallel configurations. Our approach enables fine-grained workload allocation tailored for heterogeneous nodes and complex network scenarios, achieving performance competitive with state-of-the-art methods under regular and stable network conditions. Additionally, we introduce a strategy pruning technique to rapidly discard infeasible parallel configurations, substantially reducing the search space and accelerating the search process through parallel execution within the simulator. Preliminary evaluations confirm that our method notably enhances training performance on heterogeneous nodes and demonstrates improved adaptability in complex, dynamic scenarios such as cloud computing environments.
Accelerating Model-Based Reinforcement Learning using Non-Linear Trajectory Optimization
Calรฌ, Marco, Giacomuzzo, Giulio, Carli, Ruggero, Libera, Alberto Dalla
This paper addresses the slow policy optimization convergence of Monte Carlo Probabilistic Inference for Learning Control (MC-PILCO), a state-of-the-art model-based reinforcement learning (MBRL) algorithm, by integrating it with iterative Linear Quadratic Regulator (iLQR), a fast trajectory optimization method suitable for nonlinear systems. The proposed method, Exploration-Boosted MC-PILCO (EB-MC-PILCO), leverages iLQR to generate informative, exploratory trajectories and initialize the policy, significantly reducing the number of required optimization steps. Experiments on the cart-pole task demonstrate that EB-MC-PILCO accelerates convergence compared to standard MC-PILCO, achieving up to $\bm{45.9\%}$ reduction in execution time when both methods solve the task in four trials. EB-MC-PILCO also maintains a $\bm{100\%}$ success rate across trials while solving the task faster, even in cases where MC-PILCO converges in fewer iterations.
Solving the Pod Repositioning Problem with Deep Reinforced Adaptive Large Neighborhood Search
The Pod Repositioning Problem (PRP) in Robotic Mobile Fulfillment Systems (RMFS) involves selecting optimal storage locations for pods returning from pick stations. This work presents an improved solution method that integrates Adaptive Large Neighborhood Search (ALNS) with Deep Reinforcement Learning (DRL). A DRL agent dynamically selects destroy and repair operators and adjusts key parameters such as destruction degree and acceptance thresholds during the search. Specialized heuristics for both operators are designed to reflect PRP-specific characteristics, including pod usage frequency and movement costs. Computational results show that this DRL-guided ALNS outperforms traditional approaches such as cheapest-place, fixed-place, binary integer programming, and static heuristics. The method demonstrates strong solution quality and illustrating the benefit of learning-driven control within combinatorial optimization for warehouse systems.
EALG: Evolutionary Adversarial Generation of Language Model-Guided Generators for Combinatorial Optimization
Duan, Ruibo, Liu, Yuxin, Dong, Xinyao, Fan, Chenglin
Generating challenging instances is crucial for the evaluation and advancement of combinatorial optimization solvers. In this work, we introduce EALG (Evolutionary Adversarial Generation of Language Model-Guided Generators), a novel framework that automates the co-evolution of optimization problem instances and their corresponding heuristic solvers using large language models (LLMs). EALG leverages a mutation-based adversarial approach that dynamically evolves instance generation procedures to create increasingly difficult problems, while simultaneously synthesizing adaptive heuristic algorithms through interactions with LLMs guided by algorithmic structure. Unlike existing approaches that focus solely on static benchmark creation or manual solver design, EALG provides a seamless pipeline from instance generation to solver synthesis. Experimental results demonstrate that EALG generates significantly harder instances than current benchmarks, and its synthesized solvers generalize effectively across a broad spectrum of combinatorial tasks. This work explores a new paradigm for combinatorial optimization that integrates instance generation with solver design, resulting in state-of-the-art performance.
ORPP: Self-Optimizing Role-playing Prompts to Enhance Language Model Capabilities
Duan, Yifan, Tang, Yihong, Chen, Kehai, Nie, Liqiang, Zhang, Min
High-quality prompts are crucial for eliciting outstanding performance from large language models (LLMs) on complex tasks. Existing research has explored model-driven strategies for prompt optimization. However, these methods often suffer from high computational overhead or require strong optimization capabilities from the model itself, which limits their broad applicability.To address these challenges, we propose ORPP (Optimized Role-Playing Prompt),a framework that enhances model performance by optimizing and generating role-playing prompts. The core idea of ORPP is to confine the prompt search space to role-playing scenarios, thereby fully activating the model's intrinsic capabilities through carefully crafted, high-quality role-playing prompts. Specifically, ORPP first performs iterative optimization on a small subset of training samples to generate high-quality role-playing prompts. Then, leveraging the model's few-shot learning capability, it transfers the optimization experience to efficiently generate suitable prompts for the remaining samples.Our experimental results show that ORPP not only matches but in most cases surpasses existing mainstream prompt optimization methods in terms of performance. Notably, ORPP demonstrates superior "plug-and-play" capability. In most cases, it can be integrated with various other prompt methods and further enhance their effectiveness.
AERO: A Redirection-Based Optimization Framework Inspired by Judo for Robust Probabilistic Forecasting
Optimization remains a fundamental pillar of machine learning, yet existing methods often struggle to maintain stability and adaptability in dynamic, non linear systems, especially under uncertainty. We introduce AERO (Adversarial Energy-based Redirection Optimization), a novel framework inspired by the redirection principle in Judo, where external disturbances are leveraged rather than resisted. AERO reimagines optimization as a redirection process guided by 15 interrelated axioms encompassing adversarial correction, energy conservation, and disturbance-aware learning. By projecting gradients, integrating uncertainty driven dynamics, and managing learning energy, AERO offers a principled approach to stable and robust model updates. Applied to probabilistic solar energy forecasting, AERO demonstrates substantial gains in predictive accuracy, reliability, and adaptability, especially in noisy and uncertain environments. Our findings highlight AERO as a compelling new direction in the theoretical and practical landscape of optimization.