Goto

Collaborating Authors

 right turn


World Model Robustness via Surprise Recognition

Zollicoffer, Geigh, Chopra, Tanush, Yan, Mingkuan, Ma, Xiaoxu, Eaton, Kenneth, Riedl, Mark

arXiv.org Artificial Intelligence

AI systems deployed in the real world must contend with distractions and out-of-distribution (OOD) noise that can destabilize their policies and lead to unsafe behavior . While robust training can reduce sensitivity to some forms of noise, it is infeasible to anticipate all possible OOD conditions. T o mitigate this issue, we develop an algorithm that leverages a world model's inherent measure of surprise to reduce the impact of noise in world model-based reinforcement learning agents. W e introduce both multi-representation and single-representation rejection sampling, enabling robustness to settings with multiple faulty sensors or a single faulty sensor . While the introduction of noise typically degrades agent performance, we show that our techniques preserve performance relative to baselines under varying types and levels of noise across multiple environments within self-driving simulation domains (CARLA and Safety Gymnasium). Furthermore, we demonstrate that our methods enhance the stability of two state-of-the-art world models with markedly different underlying architectures: Cosmos and DreamerV3.


DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models

Hao, Yuhan, Li, Zhengning, Sun, Lei, Wang, Weilong, Yi, Naixin, Song, Sheng, Qin, Caihong, Zhou, Mofan, Zhan, Yifei, Lang, Xianpeng

arXiv.org Artificial Intelligence

Vision-Language-Action (VLA) models have advanced autonomous driving, but existing benchmarks still lack scenario diversity, reliable action-level annotation, and evaluation protocols aligned with human preferences. To address these limitations, we introduce DriveAction, the first action-driven benchmark specifically designed for VLA models, comprising 16,185 QA pairs generated from 2,610 driving scenarios. DriveAction leverages real-world driving data proactively collected by drivers of autonomous vehicles to ensure broad and representative scenario coverage, offers high-level discrete action labels collected directly from drivers' actual driving operations, and implements an action-rooted tree-structured evaluation framework that explicitly links vision, language, and action tasks, supporting both comprehensive and task-specific assessment. Our experiments demonstrate that state-of-the-art vision-language models (VLMs) require both vision and language guidance for accurate action prediction: on average, accuracy drops by 3.3% without vision input, by 4.1% without language input, and by 8.0% without either. Our evaluation supports precise identification of model bottlenecks with robust and consistent results, thus providing new insights and a rigorous foundation for advancing human-like decisions in autonomous driving.


Applying Extended Object Tracking for Self-Localization of Roadside Radar Sensors

Han, Longfei, Xu, Qiuyu, Kefferpütz, Klaus, Elger, Gordon, Beyerer, Jürgen

arXiv.org Artificial Intelligence

Intelligent Transportation Systems (ITS) can benefit from roadside 4D mmWave radar sensors for large-scale traffic monitoring due to their weatherproof functionality, long sensing range and low manufacturing cost. However, the localization method using external measurement devices has limitations in urban environments. Furthermore, if the sensor mount exhibits changes due to environmental influences, they cannot be corrected when the measurement is performed only during the installation. In this paper, we propose self-localization of roadside radar data using Extended Object Tracking (EOT). The method analyses both the tracked trajectories of the vehicles observed by the sensor and the aerial laser scan of city streets, assigns labels of driving behaviors such as "straight ahead", "left turn", "right turn" to trajectory sections and road segments, and performs Semantic Iterative Closest Points (SICP) algorithm to register the point cloud. The method exploits the result from a down stream task -- object tracking -- for localization. We demonstrate high accuracy in the sub-meter range along with very low orientation error. The method also shows good data efficiency. The evaluation is done in both simulation and real-world tests.


CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving

Gao, Dechen, Cai, Shuangyu, Zhou, Hanchu, Wang, Hang, Soltani, Iman, Zhang, Junshan

arXiv.org Artificial Intelligence

To safely navigate intricate real-world scenarios, autonomous vehicles must be able to adapt to diverse road conditions and anticipate future events. World model (WM) based reinforcement learning (RL) has emerged as a promising approach by learning and predicting the complex dynamics of various environments. Nevertheless, to the best of our knowledge, there does not exist an accessible platform for training and testing such algorithms in sophisticated driving environments. To fill this void, we introduce CarDreamer, the first open-source learning platform designed specifically for developing WM based autonomous driving algorithms. It comprises three key components: 1) World model backbone: CarDreamer has integrated some state-of-the-art WMs, which simplifies the reproduction of RL algorithms. The backbone is decoupled from the rest and communicates using the standard Gym interface, so that users can easily integrate and test their own algorithms. 2) Built-in tasks: CarDreamer offers a comprehensive set of highly configurable driving tasks which are compatible with Gym interfaces and are equipped with empirically optimized reward functions. 3) Task development suite: This suite streamlines the creation of driving tasks, enabling easy definition of traffic flows and vehicle routes, along with automatic collection of multi-modal observation data. A visualization server allows users to trace real-time agent driving videos and performance metrics through a browser. Furthermore, we conduct extensive experiments using built-in tasks to evaluate the performance and potential of WMs in autonomous driving. Thanks to the richness and flexibility of CarDreamer, we also systematically study the impact of observation modality, observability, and sharing of vehicle intentions on AV safety and efficiency. All code and documents are accessible on https://github.com/ucd-dare/CarDreamer.


Incorporating Explanations into Human-Machine Interfaces for Trust and Situation Awareness in Autonomous Vehicles

Atakishiyev, Shahin, Salameh, Mohammad, Goebel, Randy

arXiv.org Artificial Intelligence

Autonomous vehicles often make complex decisions via machine learning-based predictive models applied to collected sensor data. While this combination of methods provides a foundation for real-time actions, self-driving behavior primarily remains opaque to end users. In this sense, explainability of real-time decisions is a crucial and natural requirement for building trust in autonomous vehicles. Moreover, as autonomous vehicles still cause serious traffic accidents for various reasons, timely conveyance of upcoming hazards to road users can help improve scene understanding and prevent potential risks. Hence, there is also a need to supply autonomous vehicles with user-friendly interfaces for effective human-machine teaming. Motivated by this problem, we study the role of explainable AI and human-machine interface jointly in building trust in vehicle autonomy. We first present a broad context of the explanatory human-machine systems with the "3W1H" (what, whom, when, how) approach. Based on these findings, we present a situation awareness framework for calibrating users' trust in self-driving behavior. Finally, we perform an experiment on our framework, conduct a user study on it, and validate the empirical findings with hypothesis testing.


Beyond Text: Improving LLM's Decision Making for Robot Navigation via Vocal Cues

Sun, Xingpeng, Meng, Haoming, Chakraborty, Souradip, Bedi, Amrit Singh, Bera, Aniket

arXiv.org Artificial Intelligence

This work highlights a critical shortcoming in text-based Large Language Models (LLMs) used for human-robot interaction, demonstrating that text alone as a conversation modality falls short in such applications. While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present "Beyond Text"; an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations. This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 48.30%, but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. "Beyond Text" marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.


Experimental Validation of Sensor Fusion-based GNSS Spoofing Attack Detection Framework for Autonomous Vehicles

Dasgupta, Sagar, Shakib, Kazi Hassan, Rahman, Mizanur

arXiv.org Artificial Intelligence

To collect data, a vehicle equipped with a GNSS receiver, along with Inertial Measurement Unit (IMU) is used. The detection framework incorporates two strategies: The first strategy involves comparing the predicted location shift, which is the distance traveled between two consecutive timestamps, with the inertial sensor-based location shift. For this purpose, data from low-cost in-vehicle inertial sensors such as the accelerometer and gyroscope sensor are fused and fed into a long short-term memory (LSTM) neural network. The second strategy employs a Random-Forest supervised machine learning model to detect and classify turns, distinguishing between left and right turns using the output from the steering angle sensor. In experiments, two types of spoofing attack models: turn-by-turn and wrong turn are simulated. These spoofing attacks are modeled as SQL injection attacks, where, upon successful implementation, the navigation system perceives injected spoofed location information as legitimate while being unable to detect legitimate GNSS signals. Importantly, the IMU data remains uncompromised throughout the spoofing attack. To test the effectiveness of the detection framework, experiments are conducted in Tuscaloosa, AL, mimicking urban road structures. The results demonstrate the framework's ability to detect various sophisticated GNSS spoofing attacks, even including slow position drifting attacks.


LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Shao, Hao, Hu, Yuxuan, Wang, Letian, Waslander, Steven L., Liu, Yu, Li, Hongsheng

arXiv.org Artificial Intelligence

Despite significant recent progress in the field of autonomous driving, modern methods still struggle and can incur serious accidents when encountering long-tail unforeseen events and challenging urban scenarios. On the one hand, large language models (LLM) have shown impressive reasoning capabilities that approach "Artificial General Intelligence". On the other hand, previous autonomous driving methods tend to rely on limited-format inputs (e.g. sensor data and navigation waypoints), restricting the vehicle's ability to understand language information and interact with humans. To this end, this paper introduces LMDrive, a novel language-guided, end-to-end, closed-loop autonomous driving framework. LMDrive uniquely processes and integrates multi-modal sensor data with natural language instructions, enabling interaction with humans and navigation software in realistic instructional settings. To facilitate further research in language-based closed-loop autonomous driving, we also publicly release the corresponding dataset which includes approximately 64K instruction-following data clips, and the LangAuto benchmark that tests the system's ability to handle complex instructions and challenging driving scenarios. Extensive closed-loop experiments are conducted to demonstrate LMDrive's effectiveness. To the best of our knowledge, we're the very first work to leverage LLMs for closed-loop end-to-end autonomous driving. Codes, models, and datasets can be found at https://github.com/opendilab/LMDrive


Cruise updates software for autonomous vehicles after crash

Associated Press

An autonomous robotaxi run by Cruise LLC got into a crash while making a left turn, causing the company to recall 80 vehicles with a software update. The San Francisco-based unit of General Motors says the crash happened about 11 p.m. on June 3. The company says it filed recall paperwork at the request of federal safety regulators and to be transparent with the public. In documents posted Thursday by the National Highway Traffic Safety Administration, Cruise said one of its vehicles was making an unprotected left turn at an intersection when it was hit by an oncoming vehicle. The oncoming vehicle was in a right turn and bus lane when it switched lanes and went straight, hitting the right rear of the Cruise car.


The Ethical AI Question Of Whether Self-Driving Cars Ought To Be A Good Samaritan And Forewarn When Human-Driven Cars Are Going To Crash Into Each Other

#artificialintelligence

When driving, mind your own business or help other drivers, that's the question to be pondered. That's also a common refrain and refers to the notion of being helpful to others, even though they might be complete strangers and you do not know them at all. Which of those two catchphrases or words of wisdom would you choose? You probably make daily decisions about those two possibilities. There are situations and settings wherein you opt to mind your own business. At times, it might be quite tempting to step into the middle of something, but you weigh the pros and cons of doing so, and then at times move along and do not get into the fray. On the other hand, there are times that you decide it is best to jump into the swimming pool, as it were, and get engaged. Let's turn this somewhat conceptual or philosophical discussion into something very grounded and real. I was driving my car the other day and had come up to an intersection to make a left turn.