Goto

Collaborating Authors

 Optimization


AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs

arXiv.org Artificial Intelligence

Maximizing training throughput and cost-efficiency of RL for LLMs is essential to democratize this advanced technique. One promising but challenging approach is to deploy such a computational workflow over heterogeneous GPUs. Unlike conventional large-scale LLM pretraining, RL training generally decomposes into three coupled stages, i.e., rollout generation, reward computation, and policy/value updates, which exhibit markedly different compute intensities, memory footprints, and communication patterns. Recent research shows that fully asynchronous RL training can disaggregate these stages across disjoint hardware pools without sacrificing training stability, creating a great opportunity for real-world heterogeneous deployment. To this end, we present AReaL-Hex, a heterogeneity-aware asynchronous RL training system that effectively schedules how to execute rollout generation and policy model training over heterogeneous GPUs while enforcing data staleness bounds. Concretely, we use a two-phase scheduler: (i) a constrained search with MILP to select per-stage parallelization strategies and workload assignments given a resource budget, and (ii) a graph-partitioning step that allocates heterogeneous GPUs and interconnects to maximize end-to-end throughput. Built atop a fully asynchronous RL architecture, AReaL-Hex maps HBM-I/O-bound generation and compute-bound optimization to more cost-efficient resources and balances their producer-consumer interactions to avoid both idleness and stale rollout trajectories. On the mathematical reasoning task with various model scales (1.5B, 7B, and 14B), compared to homogeneous deployments of state-of-the-art asynchronous RL systems: (i) When maintaining the same total budgets, AReaL-Hex delivers up to 1.50x higher training throughput; (ii) When achieving the same training throughput, AReaL-Hex results in up to 1.46x reduction in training cost.


Trust Region-Based Bayesian Optimisation to Discover Diverse Solutions

arXiv.org Artificial Intelligence

Bayesian optimisation (BO) is a surrogate-based optimisation technique that efficiently solves expensive black-box functions with small evaluation budgets. Recent studies consider trust regions to improve the scalability of BO approaches when the problem space scales to more dimensions. Motivated by this research, we explore the effectiveness of trust region-based BO algorithms for diversity optimisation in different dimensional black box problems. We propose diversity optimisation approaches extending TuRBO1, which is the first BO method that uses a trust region-based approach for scalability. We extend TuRBO1 as divTuRBO1, which finds an optimal solution while maintaining a given distance threshold relative to a reference solution set. We propose two approaches to find diverse solutions for black-box functions by combining divTuRBO1 runs in a sequential and an interleaving fashion. We conduct experimental investigations on the proposed algorithms and compare their performance with that of the baseline method, ROBOT (rank-ordered Bayesian optimisation with trust regions). We evaluate proposed algorithms on benchmark functions with dimensions 2 to 20. Experimental investigations demonstrate that the proposed methods perform well, particularly in larger dimensions, even with a limited evaluation budget.


Spatial Crowdsourcing-based Task Allocation for UAV-assisted Maritime Data Collection

arXiv.org Artificial Intelligence

Abstract--Driven by the unceasing development of maritime services, tasks of unmanned aerial vehicle (UA V)-assisted maritime data collection (MDC) are becoming increasingly diverse, complex and personalized. As a result, effective task allocation for MDC is becoming increasingly critical. In this work, integrating the concept of spatial crowdsourcing (SC), we develop an SCbased MDC network model and investigate the task allocation problem for UA V-assisted MDC. In variable maritime service scenarios, tasks are allocated to UA Vs based on the spatial and temporal requirements of the tasks, as well as the mobility of the UA Vs. T o address this problem, we design an SCbased task allocation algorithm for the MDC (SC-MDC-T A). The quality estimation is utilized to assess and regulate task execution quality by evaluating signal to interference plus noise ratio and the UA V energy consumption. The reverse auction is employed to potentially reduce the task waiting time as much as possible while ensuring timely completion. Additionally, we establish typical task allocation scenarios based on maritime service requirements indicated by electronic navigational charts. Simulation results demonstrate that the proposed SC-MDC-T A algorithm effectively allocates tasks for various MDC scenarios. Furthermore, compared to the benchmark, the SC-MDC-T A algorithm can also reduce the task completion time and lower the UA V energy consumption. RIVEN by the continuous development of maritime services such as marine resources exploration, reconnaissance and surveillance, anti-submarine, marine tourism, marine transportation and emergency collection, tasks of unmanned aerial vehicle (UA V)-assisted maritime data collection (MDC) are becoming increasingly diverse, complex and personalized [1]-[3]. Personal use of this material is permitted.


FGO MythBusters: Explaining how Kalman Filter variants achieve the same performance as FGO in navigation applications

arXiv.org Artificial Intelligence

Sliding window-factor graph optimization (SW-FGO) has gained more and more attention in navigation research due to its robust approximation to non-Gaussian noises and nonlinearity of measuring models. There are lots of works focusing on its application performance compared to extended Kalman filter (EKF) but there is still a myth at the theoretical relationship between the SW-FGO and EKF. In this paper, we find the necessarily fair condition to connect SW-FGO and Kalman filter variants (KFV) (e.g., EKF, iterative EKF (IEKF), robust EKF (REKF) and robust iterative EKF (RIEKF)). Based on the conditions, we propose a recursive FGO (Re-FGO) framework to represent KFV under SW-FGO formulation. Under explicit conditions (Markov assumption, Gaussian noise with L2 loss, and a one-state window), Re-FGO regenerates exactly to EKF/IEKF/REKF/RIEKF, while SW-FGO shows measurable benefits in nonlinear, non-Gaussian regimes at a predictable compute cost. Finally, after clarifying the connection between them, we highlight the unique advantages of SW-FGO in practical phases, especially on numerical estimation and deep learning integration. The code and data used in this work is open sourced at https://github.com/Baoshan-Song/KFV-FGO-Comparison.


Computational Budget Should Be Considered in Data Selection

arXiv.org Artificial Intelligence

Data selection improves computational efficiency by choosing informative subsets of training samples. However, existing methods ignore the compute budget, treating data selection and importance evaluation independently of compute budget constraints. Yet empirical studies show no algorithm can consistently outperform others (or even random selection) across varying budgets. We therefore argue that compute budget must be integral to data-selection strategies, since different budgets impose distinct requirements on data quantity, quality, and distribution for effective training. To this end, we propose a novel Computational budget-Aware Data Selection (CADS) method and naturally formulate it into a bilevel optimization framework, where the inner loop trains the model within the constraints of the computational budget on some selected subset of training data, while the outer loop optimizes data selection based on model evaluation. Our technical contributions lie in addressing two main challenges in solving this bilevel optimization problem: the expensive Hessian matrix estimation for outer-loop gradients and the computational burden of achieving inner-loop optimality during iterations. To solve the first issue, we propose a probabilistic reparameterization strategy and compute the gradient using a Hessian-free policy gradient estimator. To address the second challenge, we transform the inner optimization problem into a penalty term in the outer objective, further discovering that we only need to estimate the minimum of a one-dimensional loss to calculate the gradient, significantly improving efficiency. Extensive experiments show that our method achieves performance gains of up to 14.42% over baselines in vision and language benchmarks.


Dual-Regularized Riccati Recursions for Interior-Point Optimal Control

arXiv.org Artificial Intelligence

Abstract-- We derive closed-form extensions of Riccati's recursions (both sequential [4] and parallel [7]) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each primal step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT -licensed implementations of our methods in C++ and JAX. Numerical optimal control, both real-time and offline, has found numerous application domains, ranging from trajectory optimization for robotics (e.g. for autonomous cars, unmanned aerial vehicles, legged robots) and airspace (e.g. Continuous-time optimal control problems, whose optimization variables are functions (thus infinite-dimensional) are typically converted into finite-dimensional optimization problems by either shooting (i.e.


Spatiotemporal Calibration for Laser Vision Sensor in Hand-eye System Based on Straight-line Constraint

arXiv.org Artificial Intelligence

Laser vision sensors (LVS) are critical perception modules for industrial robots, facilitating real-time acquisition of workpiece geometric data in welding applications. However, the camera communication delay will lead to a temporal desynchronization between captured images and the robot motions. Additionally, hand-eye extrinsic parameters may vary during prolonged measurement. To address these issues, we introduce a measurement model of LVS considering the effect of the camera's time-offset and propose a teaching-free spatiotemporal calibration method utilizing line constraints. This method involves a robot equipped with an LVS repeatedly scanning straight-line fillet welds using S-shaped trajectories. Regardless of the robot's orientation changes, all measured welding positions are constrained to a straight-line, represented by Plucker coordinates. Moreover, a nonlinear optimization model based on straight-line constraints is established. Subsequently, the Levenberg-Marquardt algorithm (LMA) is employed to optimize parameters, including time-offset, hand-eye extrinsic parameters, and straight-line parameters. The feasibility and accuracy of the proposed approach are quantitatively validated through experiments on curved weld scanning. We open-sourced the code, dataset, and simulation report at https://anonymous.4open.science/r/LVS_ST_CALIB-015F/README.md.


Pareto-NRPA: A Novel Monte-Carlo Search Algorithm for Multi-Objective Optimization

arXiv.org Artificial Intelligence

We introduce Pareto-NRPA, a new Monte-Carlo algorithm designed for multi-objective optimization problems over discrete search spaces. Extending the Nested Rollout Policy Adaptation (NRPA) algorithm originally formulated for single-objective problems, Pareto-NRPA generalizes the nested search and policy update mechanism to multi-objective optimization. The algorithm uses a set of policies to concurrently explore different regions of the solution space and maintains non-dominated fronts at each level of search. Policy adaptation is performed with respect to the diversity and isolation of sequences within the Pareto front. We benchmark Pareto-NRPA on two classes of problems: a novel bi-objective variant of the Traveling Salesman Problem with Time Windows problem (MO-TSPTW), and a neural architecture search task on well-known benchmarks. Results demonstrate that Pareto-NRPA achieves competitive performance against state-of-the-art multi-objective algorithms, both in terms of convergence and diversity of solutions. Particularly, Pareto-NRPA strongly outperforms state-of-the-art evolutionary multi-objective algorithms on constrained search spaces. To our knowledge, this work constitutes the first adaptation of NRPA to the multi-objective setting.


Distributionally Robust Optimization with Adversarial Data Contamination

arXiv.org Artificial Intelligence

Distributionally Robust Optimization (DRO) provides a framework for decision-making under distributional uncertainty, yet its effectiveness can be compromised by outliers in the training data. This paper introduces a principled approach to simultaneously address both challenges. We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions, where an $ε$-fraction of the training data is adversarially corrupted. Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts, alongside an efficient algorithm inspired by robust statistics to solve the resulting optimization problem. We prove that our method achieves an estimation error of $O(\sqrtε)$ for the true DRO objective value using only the contaminated data under the bounded covariance assumption. This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.


Tight analyses of first-order methods with error feedback

arXiv.org Artificial Intelligence

Communication between agents often constitutes a major computational bottleneck in distributed learning. One of the most common mitigation strategies is to compress the information exchanged, thereby reducing communication overhead. To counteract the degradation in convergence associated with compressed communication, error feedback schemes -- most notably $\mathrm{EF}$ and $\mathrm{EF}^{21}$ -- were introduced. In this work, we provide a tight analysis of both of these methods. Specifically, we find the Lyapunov function that yields the best possible convergence rate for each method -- with matching lower bounds. This principled approach yields sharp performance guarantees and enables a rigorous, apples-to-apples comparison between $\mathrm{EF}$, $\mathrm{EF}^{21}$, and compressed gradient descent. Our analysis is carried out in the simplified single-agent setting, which allows for clean theoretical insights and fair comparison of the underlying mechanisms.