Chen, Yaobin
On-Board Vision-Language Models for Personalized Autonomous Vehicle Motion Control: System Design and Real-World Validation
Cui, Can, Yang, Zichong, Zhou, Yupeng, Peng, Juntong, Park, Sung-Yeon, Zhang, Cong, Ma, Yunsheng, Cao, Xu, Ye, Wenqian, Feng, Yiheng, Panchal, Jitesh, Li, Lingxi, Chen, Yaobin, Wang, Ziran
Personalized driving refers to an autonomous vehicle's ability to adapt its driving behavior or control strategies to match individual users' preferences and driving styles while maintaining safety and comfort standards. However, existing works either fail to capture every individual preference precisely or become computationally inefficient as the user base expands. Vision-Language Models (VLMs) offer promising solutions to this front through their natural language understanding and scene reasoning capabilities. In this work, we propose a lightweight yet effective on-board VLM framework that provides low-latency personalized driving performance while maintaining strong reasoning capabilities. Our solution incorporates a Retrieval-Augmented Generation (RAG)-based memory module that enables continuous learning of individual driving preferences through human feedback. Through comprehensive real-world vehicle deployment and experiments, our system has demonstrated the ability to provide safe, comfortable, and personalized driving experiences across various scenarios and significantly reduce takeover rates by up to 76.9%. To the best of our knowledge, this work represents the first end-to-end VLM-based motion control system in real-world autonomous vehicles.
Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment
Cui, Can, Ma, Yunsheng, Yang, Zichong, Zhou, Yupeng, Liu, Peiran, Lu, Juanwu, Li, Lingxi, Chen, Yaobin, Panchal, Jitesh H., Abdelraouf, Amr, Gupta, Rohit, Han, Kyungtae, Wang, Ziran
With the broader usage and highly successful development of Large Language Models (LLMs), there has been a growth of interest and demand for applying LLMs to autonomous driving technology. Driven by their natural language understanding and reasoning ability, LLMs have the potential to enhance various aspects of autonomous driving systems, from perception and scene understanding to language interaction and decision-making. In this paper, we first introduce novel concepts and approaches to designing LLMs for autonomous driving (LLM4AD). Then, we propose a comprehensive benchmark for evaluating the instruction-following abilities of LLMs within the autonomous driving domain. Furthermore, we conduct a series of experiments on both simulation and real-world vehicle platforms, thoroughly evaluating the performance and potential of our LLM4AD systems. Our research highlights the significant potential of LLMs to enhance various aspects of autonomous vehicle technology, from perception and scene understanding to language interaction and decision-making.
Large Language Models for Autonomous Driving: Real-World Experiments
Cui, Can, Yang, Zichong, Zhou, Yupeng, Ma, Yunsheng, Lu, Juanwu, Li, Lingxi, Chen, Yaobin, Panchal, Jitesh, Wang, Ziran
Autonomous driving systems are increasingly popular in today's technological landscape, where vehicles with partial automation have already been widely available on the market, and the full automation era with "driverless" capabilities is near the horizon. However, accurately understanding humans' commands, particularly for autonomous vehicles that have only passengers instead of drivers, and achieving a high level of personalization remain challenging tasks in the development of autonomous driving systems. In this paper, we introduce a Large Language Model (LLM)-based framework Talk-to-Drive (Talk2Drive) to process verbal commands from humans and make autonomous driving decisions with contextual information, satisfying their personalized preferences for safety, efficiency, and comfort. First, a speech recognition module is developed for Talk2Drive to interpret verbal inputs from humans to textual instructions, which are then sent to LLMs for reasoning. Then, appropriate commands for the Electrical Control Unit (ECU) are generated, achieving a 100% success rate in executing codes. Real-world experiments show that our framework can substantially reduce the takeover rate for a diverse range of drivers by up to 90.1%. To the best of our knowledge, Talk2Drive marks the first instance of employing an LLM-based system in a real-world autonomous driving environment.
An Efficient Probabilistic Solution to Mapping Errors in LiDAR-Camera Fusion for Autonomous Vehicles
Shen, Dan, Zhang, Zhengming, Tian, Renran, Chen, Yaobin, Sherony, Rini
LiDAR-camera fusion is one of the core processes for the perception system of current automated driving systems. The typical sensor fusion process includes a list of coordinate transformation operations following system calibration. Although a significant amount of research has been done to improve the fusion accuracy, there are still inherent data mapping errors in practice related to system synchronization offsets, vehicle vibrations, the small size of the target, and fast relative moving speeds. Moreover, more and more complicated algorithms to improve fusion accuracy can overwhelm the onboard computational resources, limiting the actual implementation. This study proposes a novel and low-cost probabilistic LiDAR-Camera fusion method to alleviate these inherent mapping errors in scene reconstruction. By calculating shape similarity using KL-divergence and applying RANSAC-regression-based trajectory smoother, the effects of LiDAR-camera mapping errors are minimized in object localization and distance estimation. Designed experiments are conducted to prove the robustness and effectiveness of the proposed strategy.
SceNDD: A Scenario-based Naturalistic Driving Dataset
Prabu, Avinash, Ranjan, Nitya, Li, Lingxi, Tian, Renran, Chien, Stanley, Chen, Yaobin, Sherony, Rini
In this paper, we propose SceNDD: a scenario-based naturalistic driving dataset that is built upon data collected from an instrumented vehicle in downtown Indianapolis. The data collection was completed in 68 driving sessions with different drivers, where each session lasted about 20--40 minutes. The main goal of creating this dataset is to provide the research community with real driving scenarios that have diverse trajectories and driving behaviors. The dataset contains ego-vehicle's waypoints, velocity, yaw angle, as well as non-ego actor's waypoints, velocity, yaw angle, entry-time, and exit-time. Certain flexibility is provided to users so that actors, sensors, lanes, roads, and obstacles can be added to the existing scenarios. We used a Joint Probabilistic Data Association (JPDA) tracker to detect non-ego vehicles on the road. We present some preliminary results of the proposed dataset and a few applications associated with it. The complete dataset is expected to be released by early 2023.
PSI: A Pedestrian Behavior Dataset for Socially Intelligent Autonomous Car
Chen, Tina, Tian, Renran, Chen, Yaobin, Domeyer, Joshua, Toyoda, Heishiro, Sherony, Rini, Jing, Taotao, Ding, Zhengming
Prediction of pedestrian behavior is critical for fully autonomous vehicles to drive in busy city streets safely and efficiently. The future autonomous cars need to fit into mixed conditions with not only technical but also social capabilities. As more algorithms and datasets have been developed to predict pedestrian behaviors, these efforts lack the benchmark labels and the capability to estimate the temporal-dynamic intent changes of the pedestrians, provide explanations of the interaction scenes, and support algorithms with social intelligence. This paper proposes and shares another benchmark dataset called the IUPUI-CSRC Pedestrian Situated Intent (PSI) data with two innovative labels besides comprehensive computer vision labels. The first novel label is the dynamic intent changes for the pedestrians to cross in front of the ego-vehicle, achieved from 24 drivers with diverse backgrounds. The second one is the text-based explanations of the driver reasoning process when estimating pedestrian intents and predicting their behaviors during the interaction period. These innovative labels can enable several computer vision tasks, including pedestrian intent/behavior prediction, vehicle-pedestrian interaction segmentation, and video-to-language mapping for explainable algorithms. The released dataset can fundamentally improve the development of pedestrian behavior prediction models and develop socially intelligent autonomous cars to interact with pedestrians efficiently. The dataset has been evaluated with different tasks and is released to the public to access.