Optimization
Human-aligned Safe Reinforcement Learning for Highway On-Ramp Merging in Dense Traffic
Li, Yang, Yuan, Shijie, Chang, Yuan, Chen, Xiaolong, Yang, Qisong, Yang, Zhiyuan, Qin, Hongmao
Most reinforcement learning (RL) approaches for the decision-making of autonomous driving consider safety as a reward instead of a cost, which makes it hard to balance the tradeoff between safety and other objectives. Human risk preference has also rarely been incorporated, and the trained policy might be either conservative or aggressive for users. To this end, this study proposes a human-aligned safe RL approach for autonomous merging, in which the high-level decision problem is formulated as a constrained Markov decision process (CMDP) that incorporates users' risk preference into the safety constraints, followed by a model predictive control (MPC)-based low-level control. The safety level of RL policy can be adjusted by computing cost limits of CMDP's constraints based on risk preferences and traffic density using a fuzzy control method. To filter out unsafe or invalid actions, we design an action shielding mechanism that pre-executes RL actions using an MPC method and performs collision checks with surrounding agents. We also provide theoretical proof to validate the effectiveness of the shielding mechanism in enhancing RL's safety and sample efficiency. Simulation experiments in multiple levels of traffic densities show that our method can significantly reduce safety violations without sacrificing traffic efficiency. Furthermore, due to the use of risk preference-aware constraints in CMDP and action shielding, we can not only adjust the safety level of the final policy but also reduce safety violations during the training stage, proving a promising solution for online learning in real-world environments.
Impact of Temporal Delay on Radar-Inertial Odometry
ล tironja, Vlaho-Josip, Petroviฤ, Luka, Perลกiฤ, Juraj, Markoviฤ, Ivan, Petroviฤ, Ivan
Accurate ego-motion estimation is a critical component of any autonomous system. Conventional ego-motion sensors, such as cameras and LiDARs, may be compromised in adverse environmental conditions, such as fog, heavy rain, or dust. Automotive radars, known for their robustness to such conditions, present themselves as complementary sensors or a promising alternative within the ego-motion estimation frameworks. In this paper we propose a novel Radar-Inertial Odometry (RIO) system that integrates an automotive radar and an inertial measurement unit. The key contribution is the integration of online temporal delay calibration within the factor graph optimization framework that compensates for potential time offsets between radar and IMU measurements. To validate the proposed approach we have conducted thorough experimental analysis on real-world radar and IMU data. The results show that, even without scan matching or target tracking, integration of online temporal calibration significantly reduces localization error compared to systems that disregard time synchronization, thus highlighting the important role of, often neglected, accurate temporal alignment in radar-based sensor fusion systems for autonomous navigation.
An Accelerated Alternating Partial Bregman Algorithm for ReLU-based Matrix Decomposition
Wang, Qingsong, Qu, Yunfei, Cui, Chunfeng, Han, Deren
Despite the remarkable success of low-rank estimation in data mining, its effectiveness diminishes when applied to data that inherently lacks low-rank structure. To address this limitation, in this paper, we focus on non-negative sparse matrices and aim to investigate the intrinsic low-rank characteristics of the rectified linear unit (ReLU) activation function. We first propose a novel nonlinear matrix decomposition framework incorporating a comprehensive regularization term designed to simultaneously promote useful structures in clustering and compression tasks, such as low-rankness, sparsity, and non-negativity in the resulting factors. This formulation presents significant computational challenges due to its multi-block structure, non-convexity, non-smoothness, and the absence of global gradient Lipschitz continuity. To address these challenges, we develop an accelerated alternating partial Bregman proximal gradient method (AAPB), whose distinctive feature lies in its capability to enable simultaneous updates of multiple variables. Under mild and theoretically justified assumptions, we establish both sublinear and global convergence properties of the proposed algorithm. Through careful selection of kernel generating distances tailored to various regularization terms, we derive corresponding closed-form solutions while maintaining the $L$-smooth adaptable property always holds for any $L\ge 1$. Numerical experiments, on graph regularized clustering and sparse NMF basis compression confirm the effectiveness of our model and algorithm.
Unifying Model Predictive Path Integral Control, Reinforcement Learning, and Diffusion Models for Optimal Control and Planning
Model Predictive Path Integral (MPPI) control, Reinforcement Learning (RL), and Diffusion Models have each demonstrated strong performance in trajectory optimization, decision-making, and motion planning. However, these approaches have traditionally been treated as distinct methodologies with separate optimization frameworks. In this work, we establish a unified perspective that connects MPPI, RL, and Diffusion Models through gradient-based optimization on the Gibbs measure. We first show that MPPI can be interpreted as performing gradient ascent on a smoothed energy function. We then demonstrate that Policy Gradient methods reduce to MPPI by applying an exponential transformation to the objective function. Additionally, we establish that the reverse sampling process in diffusion models follows the same update rule as MPPI.
Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model
Huang, Yaxuan, Dai, Xili, Wang, Jianan, Qi, Xianbiao, Yuan, Yixing, Yue, Xiangyu
Room layout estimation from multiple-perspective images is poorly investigated due to the complexities that emerge from multi-view geometry, which requires muti-step solutions such as camera intrinsic and extrinsic estimation, image matching, and triangulation. However, in 3D reconstruction, the advancement of recent 3D foundation models such as DUSt3R has shifted the paradigm from the traditional multi-step structure-from-motion process to an end-to-end single-step approach. To this end, we introduce Plane-DUSt3R, a novel method for multi-view room layout estimation leveraging the 3D foundation model DUSt3R. Plane-DUSt3R incorporates the DUSt3R framework and fine-tunes on a room layout dataset (Structure3D) with a modified objective to estimate structural planes. By generating uniform and parsimonious results, Plane-DUSt3R enables room layout estimation with only a single post-processing step and 2D detection results. Unlike previous methods that rely on single-perspective or panorama image, Plane-DUSt3R extends the setting to handle multiple-perspective images. Moreover, it offers a streamlined, end-to-end solution that simplifies the process and reduces error accumulation. Experimental results demonstrate that Plane-DUSt3R not only outperforms state-of-the-art methods on the synthetic dataset but also proves robust and effective on in the wild data with different image styles such as cartoon. Our code is available at: https://github.com/justacar/Plane-DUSt3R
Spike-and-Slab Posterior Sampling in High Dimensions
Kumar, Syamantak, Sarkar, Purnamrita, Tian, Kevin, Zhu, Yusong
Posterior sampling with the spike-and-slab prior [MB88], a popular multimodal distribution used to model uncertainty in variable selection, is considered the theoretical gold standard method for Bayesian sparse linear regression [CPS09, Roc18]. However, designing provable algorithms for performing this sampling task is notoriously challenging. Existing posterior samplers for Bayesian sparse variable selection tasks either require strong assumptions about the signal-to-noise ratio (SNR) [YWJ16], only work when the measurement count grows at least linearly in the dimension [MW24], or rely on heuristic approximations to the posterior. We give the first provable algorithms for spike-and-slab posterior sampling that apply for any SNR, and use a measurement count sublinear in the problem dimension. Concretely, assume we are given a measurement matrix $\mathbf{X} \in \mathbb{R}^{n\times d}$ and noisy observations $\mathbf{y} = \mathbf{X}\mathbf{\theta}^\star + \mathbf{\xi}$ of a signal $\mathbf{\theta}^\star$ drawn from a spike-and-slab prior $\pi$ with a Gaussian diffuse density and expected sparsity k, where $\mathbf{\xi} \sim \mathcal{N}(\mathbb{0}_n, \sigma^2\mathbf{I}_n)$. We give a polynomial-time high-accuracy sampler for the posterior $\pi(\cdot \mid \mathbf{X}, \mathbf{y})$, for any SNR $\sigma^{-1}$ > 0, as long as $n \geq k^3 \cdot \text{polylog}(d)$ and $X$ is drawn from a matrix ensemble satisfying the restricted isometry property. We further give a sampler that runs in near-linear time $\approx nd$ in the same setting, as long as $n \geq k^5 \cdot \text{polylog}(d)$. To demonstrate the flexibility of our framework, we extend our result to spike-and-slab posterior sampling with Laplace diffuse densities, achieving similar guarantees when $\sigma = O(\frac{1}{k})$ is bounded.
Weighted Euclidean Distance Matrices over Mixed Continuous and Categorical Inputs for Gaussian Process Models
Pu, Mingyu, Wang, Songhao, Wang, Haowei, Ng, Szu Hui
Gaussian Process (GP) models are widely utilized as surrogate models in scientific and engineering fields. However, standard GP models are limited to continuous variables due to the difficulties in establishing correlation structures for categorical variables. To overcome this limitati on, we introduce WEighted Euclidean distance matrices Gaussian Process (WEGP). WEGP constructs the kernel function for each categorical input by estimating the Euclidean distance matrix (EDM) among all categorical choices of this input. The EDM is represented as a linear combination of several predefined base EDMs, each scaled by a positive weight. The weights, along with other kernel hyperparameters, are inferred using a fully Bayesian framework. We analyze the predictive performance of WEGP theoretically. Numerical experiments validate the accuracy of our GP model, and by WEGP, into Bayesian Optimization (BO), we achieve superior performance on both synthetic and real-world optimization problems.
Robust Multi-Source Domain Adaptation under Label Shift
Xu, Congbin, Qian, Chengde, Wang, Zhaojun, Zou, Changliang
As the volume of data continues to expand, it becomes increasingly common for data to be aggregated from multiple sources. Leveraging multiple sources for model training typically achieves better predictive performance on test datasets. Unsupervised multi-source domain adaptation aims to predict labels of unlabeled samples in the target domain by using labeled samples from source domains. This work focuses on robust multi-source domain adaptation for multi-category classification problems against the heterogeneity of label shift and data contamination. We investigate a domain-weighted empirical risk minimization framework for robust estimation of the target domain's class proportion. Inspired by outlier detection techniques, we propose a refinement procedure within this framework. With the estimated class proportion, robust classifiers for the target domain can be constructed. Theoretically, we study the finite-sample error bounds of the domain-weighted empirical risk minimization and highlight the improvement of the refinement step. Numerical simulations and real-data applications demonstrate the superiority of the proposed method.
The Distributionally Robust Optimization Model of Sparse Principal Component Analysis
Wang, Lei, Liu, Xin, Chen, Xiaojun
We consider sparse principal component analysis (PCA) under a stochastic setting where the underlying probability distribution of the random parameter is uncertain. This problem is formulated as a distributionally robust optimization (DRO) model based on a constructive approach to capturing uncertainty in the covariance matrix, which constitutes a nonsmooth constrained min-max optimization problem. We further prove that the inner maximization problem admits a closed-form solution, reformulating the original DRO model into an equivalent minimization problem on the Stiefel manifold. This transformation leads to a Riemannian optimization problem with intricate nonsmooth terms, a challenging formulation beyond the reach of existing algorithms. To address this issue, we devise an efficient smoothing manifold proximal gradient algorithm. We prove the Riemannian gradient consistency and global convergence of our algorithm to a stationary point of the nonsmooth minimization problem. Moreover, we establish the iteration complexity of our algorithm. Finally, numerical experiments are conducted to validate the effectiveness and scalability of our algorithm, as well as to highlight the necessity and rationality of adopting the DRO model for sparse PCA.
CorrA: Leveraging Large Language Models for Dynamic Obstacle Avoidance of Autonomous Vehicles
Wang, Shanting, Typaldos, Panagiotis, Malikopoulos, Andreas A.
CorrA: Leveraging Large Language Models for Dynamic Obstacle A voidance of Autonomous V ehicles Shanting Wang 1, Panagiotis Typaldos 2 and Andreas A. Malikopoulos 3 Abstract -- In this paper, we present Corridor-Agent (CorrA), a framework that integrates large language models (LLMs) with model predictive control (MPC) to address the challenges of dynamic obstacle avoidance in autonomous vehicles. Our approach leverages LLM reasoning ability to generate appropriate parameters for sigmoid-based boundary functions that define safe corridors around obstacles, effectively reducing the state-space of the controlled vehicle. The proposed framework adjusts these boundaries dynamically based on real-time vehicle data that guarantees collision-free trajectories while also ensuring both computational efficiency and trajectory optimality. The problem is formulated as an optimal control problem and solved with differential dynamic programming (DDP) for constrained optimization, and the proposed approach is embedded within an MPC framework. Extensive simulation and real-world experiments demonstrate that the proposed framework achieves superior performance in maintaining safety and efficiency in complex, dynamic environments compared to a baseline MPC approach. I NTRODUCTION The rapid development of advanced sensing, computation, and artificial intelligence technologies has made autonomous vehicles (A Vs) more realistic and made related studies unprecedented. However, the complexity, dynamics, and unpredictability of real-world environments have impeded the deployment of A V applications. Until A Vs dominate the transportation market, we face the challenge of mixed autonomy systems where A Vs and human-driven vehicles (HDVs) must coexist safely.