Wang, Yue
SMART: Advancing Scalable Map Priors for Driving Topology Reasoning
Ye, Junjie, Paz, David, Zhang, Hengyuan, Guo, Yuliang, Huang, Xinyu, Christensen, Henrik I., Wang, Yue, Ren, Liu
Topology reasoning is crucial for autonomous driving as it enables comprehensive understanding of connectivity and relationships between lanes and traffic elements. While recent approaches have shown success in perceiving driving topology using vehicle-mounted sensors, their scalability is hindered by the reliance on training data captured by consistent sensor configurations. We identify that the key factor in scalable lane perception and topology reasoning is the elimination of this sensor-dependent feature. To address this, we propose SMART, a scalable solution that leverages easily available standard-definition (SD) and satellite maps to learn a map prior model, supervised by large-scale geo-referenced high-definition (HD) maps independent of sensor settings. Attributed to scaled training, SMART alone achieves superior offline lane topology understanding using only SD and satellite inputs. Extensive experiments further demonstrate that SMART can be seamlessly integrated into any online topology reasoning methods, yielding significant improvements of up to 28% on the OpenLane-V2 benchmark.
Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms
Su, Xuerui, Wang, Yue, Zhu, Jinhua, Yi, Mingyang, Xu, Feng, Ma, Zhiming, Liu, Yuting
With the rapid development of Large Language Models (LLMs), numerous Reinforcement Learning from Human Feedback (RLHF) algorithms have been introduced to improve model safety and alignment with human preferences. These algorithms can be divided into two main frameworks based on whether they require an explicit reward (or value) function for training: actor-critic-based Proximal Policy Optimization (PPO) and alignment-based Direct Preference Optimization (DPO). The mismatch between DPO and PPO, such as DPO's use of a classification loss driven by human-preferred data, has raised confusion about whether DPO should be classified as a Reinforcement Learning (RL) algorithm. To address these ambiguities, we focus on three key aspects related to DPO, RL, and other RLHF algorithms: (1) the construction of the loss function; (2) the target distribution at which the algorithm converges; (3) the impact of key components within the loss function. Specifically, we first establish a unified framework named UDRRA connecting these algorithms based on the construction of their loss functions. Next, we uncover their target policy distributions within this framework. Finally, we investigate the critical components of DPO to understand their impact on the convergence rate. Our work provides a deeper understanding of the relationship between DPO, RL, and other RLHF algorithms, offering new insights for improving existing algorithms.
From Uncertain to Safe: Conformal Fine-Tuning of Diffusion Models for Safe PDE Control
Hu, Peiyan, Qian, Xiaowei, Deng, Wenhao, Wang, Rui, Feng, Haodong, Feng, Ruiqi, Zhang, Tao, Wei, Long, Wang, Yue, Ma, Zhi-Ming, Wu, Tailin
The application of deep learning for partial differential equation (PDE)-constrained control is gaining increasing attention. However, existing methods rarely consider safety requirements crucial in real-world applications. To address this limitation, we propose Safe Diffusion Models for PDE Control (SafeDiffCon), which introduce the uncertainty quantile as model uncertainty quantification to achieve optimal control under safety constraints through both post-training and inference phases. Firstly, our approach post-trains a pre-trained diffusion model to generate control sequences that better satisfy safety constraints while achieving improved control objectives via a reweighted diffusion loss, which incorporates the uncertainty quantile estimated using conformal prediction. Secondly, during inference, the diffusion model dynamically adjusts both its generation process and parameters through iterative guidance and fine-tuning, conditioned on control targets while simultaneously integrating the estimated uncertainty quantile. We evaluate SafeDiffCon on three control tasks: 1D Burgers' equation, 2D incompressible fluid, and controlled nuclear fusion problem. Results demonstrate that SafeDiffCon is the only method that satisfies all safety constraints, whereas other classical and deep learning baselines fail. Furthermore, while adhering to safety constraints, SafeDiffCon achieves the best control performance.
Compliance while resisting: a shear-thickening fluid controller for physical human-robot interaction
Chen, Lu, Chen, Lipeng, Chen, Xiangchi, Lu, Haojian, Zheng, Yu, Wu, Jun, Wang, Yue, Zhang, Zhengyou, Xiong, Rong
Physical human-robot interaction (pHRI) is widely needed in many fields, such as industrial manipulation, home services, and medical rehabilitation, and puts higher demands on the safety of robots. Due to the uncertainty of the working environment, the pHRI may receive unexpected impact interference, which affects the safety and smoothness of the task execution. The commonly used linear admittance control (L-AC) can cope well with high-frequency small-amplitude noise, but for medium-frequency high-intensity impact, the effect is not as good. Inspired by the solid-liquid phase change nature of shear-thickening fluid, we propose a Shear-thickening Fluid Control (SFC) that can achieve both an easy human-robot collaboration and resistance to impact interference. The SFC's stability, passivity, and phase trajectory are analyzed in detail, the frequency and time domain properties are quantified, and parameter constraints in discrete control and coupled stability conditions are provided. We conducted simulations to compare the frequency and time domain characteristics of L-AC, nonlinear admittance controller (N-AC), and SFC, and validated their dynamic properties. In real-world experiments, we compared the performance of L-AC, N-AC, and SFC in both fixed and mobile manipulators. L-AC exhibits weak resistance to impact. N-AC can resist moderate impacts but not high-intensity ones, and may exhibit self-excited oscillations. In contrast, SFC demonstrated superior impact resistance and maintained stable collaboration, enhancing comfort in cooperative water delivery tasks. Additionally, a case study was conducted in a factory setting, further affirming the SFC's capability in facilitating human-robot collaborative manipulation and underscoring its potential in industrial applications.
Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling
Cai, Jianfeng, Zhu, Jinhua, Sun, Ruopei, Wang, Yue, Li, Li, Zhou, Wengang, Li, Houqiang
Reinforcement Learning from Human Feedback (RLHF) has achieved considerable success in aligning large language models (LLMs) by modeling human preferences with a learnable reward model and employing a reinforcement learning algorithm to maximize the reward model's scores. However, these reward models are susceptible to exploitation through various superficial confounding factors, with length bias emerging as a particularly significant concern. Moreover, while the pronounced impact of length bias on preference modeling suggests that LLMs possess an inherent sensitivity to length perception, our preliminary investigations reveal that fine-tuned LLMs consistently struggle to adhere to explicit length instructions. To address these two limitations, we propose a novel framework wherein the reward model explicitly differentiates between human semantic preferences and response length requirements. Specifically, we introduce a Response-conditioned Bradley-Terry (Rc-BT) model that enhances the reward model's capability in length bias mitigating and length instruction following, through training on our augmented dataset. Furthermore, we propose the Rc-DPO algorithm to leverage the Rc-BT model for direct policy optimization (DPO) of LLMs, simultaneously mitigating length bias and promoting adherence to length instructions. Extensive evaluations demonstrate that our approach substantially improves both preference modeling and length instruction compliance, with its effectiveness validated across various foundational models and preference datasets.
Physics-Inspired Distributed Radio Map Estimation
Yang, Dong, Wang, Yue, Zhang, Songyang, Li, Yingshu, Cai, Zhipeng
To gain panoramic awareness of spectrum coverage in complex wireless environments, data-driven learning approaches have recently been introduced for radio map estimation (RME). While existing deep learning based methods conduct RME given spectrum measurements gathered from dispersed sensors in the region of interest, they rely on centralized data at a fusion center, which however raises critical concerns on data privacy leakages and high communication overloads. Federated learning (FL) enhance data security and communication efficiency in RME by allowing multiple clients to collaborate in model training without directly sharing local data. However, the performance of the FL-based RME can be hindered by the problem of task heterogeneity across clients due to their unavailable or inaccurate landscaping information. To fill this gap, in this paper, we propose a physics-inspired distributed RME solution in the absence of landscaping information. The main idea is to develop a novel distributed RME framework empowered by leveraging the domain knowledge of radio propagation models, and by designing a new distributed learning approach that splits the entire RME model into two modules. A global autoencoder module is shared among clients to capture the common pathloss influence on radio propagation pattern, while a client-specific autoencoder module focuses on learning the individual features produced by local shadowing effects from the unique building distributions in local environment. Simulation results show that our proposed method outperforms the benchmarks in achieving higher performance.
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding
Chow, Wei, Mao, Jiageng, Li, Boyi, Seita, Daniel, Guizilini, Vitor, Wang, Yue
Understanding the physical world is a fundamental challenge in embodied AI, critical for enabling agents to perform complex tasks and operate safely in real-world environments. While Vision-Language Models (VLMs) have shown great promise in reasoning and task planning for embodied agents, their ability to comprehend physical phenomena remains extremely limited. To close this gap, we introduce PhysBench, a comprehensive benchmark designed to evaluate VLMs' physical world understanding capability across a diverse set of tasks. PhysBench contains 10,002 entries of interleaved video-image-text data, categorized into four major domains: physical object properties, physical object relationships, physical scene understanding, and physics-based dynamics, further divided into 19 subclasses and 8 distinct capability dimensions. Our extensive experiments, conducted on 75 representative VLMs, reveal that while these models excel in common-sense reasoning, they struggle with understanding the physical world -- likely due to the absence of physical knowledge in their training data and the lack of embedded physical priors. To tackle the shortfall, we introduce PhysAgent, a novel framework that combines the generalization strengths of VLMs with the specialized expertise of vision models, significantly enhancing VLMs' physical understanding across a variety of tasks, including an 18.4\% improvement on GPT-4o. Furthermore, our results demonstrate that enhancing VLMs' physical world understanding capabilities can help embodied agents such as MOKA. We believe that PhysBench and PhysAgent offer valuable insights and contribute to bridging the gap between VLMs and physical world understanding.
Revisit Mixture Models for Multi-Agent Simulation: Experimental Study within a Unified Framework
Lin, Longzhong, Lin, Xuewu, Xu, Kechun, Lu, Haojian, Huang, Lichao, Xiong, Rong, Wang, Yue
Simulation plays a crucial role in assessing autonomous driving systems, where the generation of realistic multi-agent behaviors is a key aspect. In multi-agent simulation, the primary challenges include behavioral multimodality and closed-loop distributional shifts. In this study, we revisit mixture models for generating multimodal agent behaviors, which can cover the mainstream methods including continuous mixture models and GPT-like discrete models. Furthermore, we introduce a closed-loop sample generation approach tailored for mixture models to mitigate distributional shifts. Within the unified mixture model~(UniMM) framework, we recognize critical configurations from both model and data perspectives. We conduct a systematic examination of various model configurations, including positive component matching, continuous regression, prediction horizon, and the number of components. Moreover, our investigation into the data configuration highlights the pivotal role of closed-loop samples in achieving realistic simulations. To extend the benefits of closed-loop samples across a broader range of mixture models, we further address the shortcut learning and off-policy learning issues. Leveraging insights from our exploration, the distinct variants proposed within the UniMM framework, including discrete, anchor-free, and anchor-based models, all achieve state-of-the-art performance on the WOSAC benchmark.
Scalable and Explainable Verification of Image-based Neural Network Controllers for Autonomous Vehicles
Parameshwaran, Aditya, Wang, Yue
Existing formal verification methods for image-based neural network controllers in autonomous vehicles often struggle with high-dimensional inputs, computational inefficiency, and a lack of explainability. These challenges make it difficult to ensure safety and reliability, as processing high-dimensional image data is computationally intensive and neural networks are typically treated as black boxes. To address these issues, we propose \textbf{SEVIN} (Scalable and Explainable Verification of Image-Based Neural Network Controllers), a framework that leverages a Variational Autoencoders (VAE) to encode high-dimensional images into a lower-dimensional, explainable latent space. By annotating latent variables with corresponding control actions, we generate convex polytopes that serve as structured input spaces for verification, significantly reducing computational complexity and enhancing scalability. Integrating the VAE's decoder with the neural network controller allows for formal and robustness verification using these explainable polytopes. Our approach also incorporates robustness verification under real-world perturbations by augmenting the dataset and retraining the VAE to capture environmental variations. Experimental results demonstrate that SEVIN achieves efficient and scalable verification while providing explainable insights into controller behavior, bridging the gap between formal verification techniques and practical applications in safety-critical systems.
Temporal Logic Guided Safe Navigation for Autonomous Vehicles
Parameshwaran, Aditya, Wang, Yue
Safety verification for autonomous vehicles (AVs) and ground robots is crucial for ensuring reliable operation given their uncertain environments. Formal language tools provide a robust and sound method to verify safety rules for such complex cyber-physical systems. In this paper, we propose a hybrid approach that combines the strengths of formal verification languages like Linear Temporal Logic (LTL) and Signal Temporal Logic (STL) to generate safe trajectories and optimal control inputs for autonomous vehicle navigation. We implement a symbolic path planning approach using LTL to generate a formally safe reference trajectory. A mixed integer linear programming (MILP) solver is then used on this reference trajectory to solve for the control inputs while satisfying the state, control and safety constraints described by STL. We test our proposed solution on two environments and compare the results with popular path planning algorithms. In contrast to conventional path planning algorithms, our formally safe solution excels in handling complex specification scenarios while ensuring both safety and comparable computation times.