vehicle and pedestrian
ParkDiffusion: Heterogeneous Multi-Agent Multi-Modal Trajectory Prediction for Automated Parking using Diffusion Models
Wei, Jiarong, Vödisch, Niclas, Rehr, Anna, Feist, Christian, Valada, Abhinav
Automated parking is a critical feature of Advanced Driver Assistance Systems (ADAS), where accurate trajectory prediction is essential to bridge perception and planning modules. Despite its significance, research in this domain remains relatively limited, with most existing studies concentrating on single-modal trajectory prediction of vehicles. In this work, we propose ParkDiffusion, a novel approach that predicts the trajectories of both vehicles and pedestrians in automated parking scenarios. ParkDiffusion employs diffusion models to capture the inherent uncertainty and multi-modality of future trajectories, incorporating several key innovations. First, we propose a dual map encoder that processes soft semantic cues and hard geometric constraints using a two-step cross-attention mechanism. Second, we introduce an adaptive agent type embedding module, which dynamically conditions the prediction process on the distinct characteristics of vehicles and pedestrians. Third, to ensure kinematic feasibility, our model outputs control signals that are subsequently used within a kinematic framework to generate physically feasible trajectories. We evaluate ParkDiffusion on the Dragon Lake Parking (DLP) dataset and the Intersections Drone (inD) dataset. Our work establishes a new baseline for heterogeneous trajectory prediction in parking scenarios, outperforming existing methods by a considerable margin.
Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning
Poudel, Bibek, Wang, Xuan, Li, Weizi, Zhu, Lei, Heaslip, Kevin
-- Reinforcement learning (RL) holds significant promise for adaptive traffic signal control. While existing RL-based methods demonstrate effectiveness in reducing vehicular congestion, their predominant focus on vehicle-centric optimization leaves pedestrian mobility needs and safety challenges unaddressed. In this paper, we present a deep RL framework for adaptive control of eight traffic signals along a real-world urban corridor, jointly optimizing both pedestrian and vehicular efficiency. Our single-agent policy is trained using real-world pedestrian and vehicle demand data derived from Wi-Fi logs and video analysis. The results demonstrate significant performance improvements over traditional fixed-time signals, reducing average wait times per pedestrian and per vehicle by up to 67% and 52% respectively, while simultaneously decreasing total wait times for both groups by up to 67 % and 53% . Additionally, our results demonstrate generalization capabilities across varying traffic demands, including conditions entirely unseen during training, validating RL's potential for developing transportation systems that serve all road users.
DEEGITS: Deep Learning based Framework for Measuring Heterogenous Traffic State in Challenging Traffic Scenarios
Islam, Muttahirul, Haque, Nazmul, Hadiuzzaman, Md.
This paper presents DEEGITS (Deep Learning Based Heterogeneous Traffic State Measurement), a comprehensive framework that leverages state-of-the-art convolutional neural network (CNN) techniques to accurately and rapidly detect vehicles and pedestrians, as well as to measure traffic states in challenging scenarios (i.e., congestion, occlusion). In this study, we enhance the training dataset through data fusion, enabling simultaneous detection of vehicles and pedestrians. Image preprocessing and augmentation are subsequently performed to improve the quality and quantity of the dataset. Transfer learning is applied on the YOLOv8 pretrained model to increase the model's capability to identify a diverse array of vehicles. Optimal hyperparameters are obtained using the Grid Search algorithm, with the Stochastic Gradient Descent (SGD) optimizer outperforming other optimizers under these settings. Extensive experimentation and evaluation demonstrate substantial accuracy within the detection framework, with the model achieving 0.794 mAP@0.5 on the validation set and 0.786 mAP@0.5 on the test set, surpassing previous benchmarks on similar datasets. The DeepSORT multi-object tracking algorithm is incorporated to track detected vehicles and pedestrians in this study. Finally, the framework is tested to measure heterogeneous traffic states in mixed traffic conditions. Two locations with differing traffic compositions and congestion levels are selected: one motorized-dominant location with moderate density and one non-motorized-dominant location with higher density. Errors are statistically insignificant for both cases, showing correlations from 0.99 to 0.88 and 0.91 to 0.97 for heterogeneous traffic flow and speed measurements, respectively.
Spatial Reasoning and Planning for Deep Embodied Agents
Humans can perform complex tasks with long-term objectives by planning, reasoning, and forecasting outcomes of actions. For embodied agents to achieve similar capabilities, they must gain knowledge of the environment transferable to novel scenarios with a limited budget of additional trial and error. Learning-based approaches, such as deep RL, can discover and take advantage of inherent regularities and characteristics of the application domain from data, and continuously improve their performances, however at a cost of large amounts of training data. This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks, focusing on enhancing learning efficiency, interpretability, and transferability across novel scenarios. Four key contributions are made. 1) CALVIN, a differential planner that learns interpretable models of the world for long-term planning. It successfully navigated partially observable 3D environments, such as mazes and indoor rooms, by learning the rewards and state transitions from expert demonstrations. 2) SOAP, an RL algorithm that discovers options unsupervised for long-horizon tasks. Options segment a task into subtasks and enable consistent execution of the subtask. SOAP showed robust performances on history-conditional corridor tasks as well as classical benchmarks such as Atari. 3) LangProp, a code optimisation framework using LLMs to solve embodied agent problems that require reasoning by treating code as learnable policies. The framework successfully generated interpretable code with comparable or superior performance to human-written experts in the CARLA autonomous driving benchmark. 4) Voggite, an embodied agent with a vision-to-action transformer backend that solves complex tasks in Minecraft. It achieved third place in the MineRL BASALT Competition by identifying action triggers to segment tasks into multiple stages.
Empowering Urban Traffic Management: Elevated 3D LiDAR for Data Collection and Advanced Object Detection Analysis
Guefrachi, Nawfal, Ghazzai, Hakim, Alsharoa, Ahmad
The 3D object detection capabilities in urban environments have been enormously improved by recent developments in Light Detection and Range (LiDAR) technology. This paper presents a novel framework that transforms the detection and analysis of 3D objects in traffic scenarios by utilizing the power of elevated LiDAR sensors. We are presenting our methodology's remarkable capacity to collect complex 3D point cloud data, which allows us to accurately and in detail capture the dynamics of urban traffic. Due to the limitation in obtaining real-world traffic datasets, we utilize the simulator to generate 3D point cloud for specific scenarios. To support our experimental analysis, we firstly simulate various 3D point cloud traffic-related objects. Then, we use this dataset as a basis for training and evaluating our 3D object detection models, in identifying and monitoring both vehicles and pedestrians in simulated urban traffic environments. Next, we fine tune the Point Voxel-Region-based Convolutional Neural Network (PV-RCNN) architecture, making it more suited to handle and understand the massive volumes of point cloud data generated by our urban traffic simulations. Our results show the effectiveness of the proposed solution in accurately detecting objects in traffic scenes and highlight the role of LiDAR in improving urban safety and advancing intelligent transportation systems.
3D Object Detection and High-Resolution Traffic Parameters Extraction Using Low-Resolution LiDAR Data
Zhang, Linlin, Yu, Xiang, Aboah, Armstrong, Adu-Gyamfi, Yaw
Traffic volume data collection is a crucial aspect of transportation engineering and urban planning, as it provides vital insights into traffic patterns, congestion, and infrastructure efficiency. Traditional manual methods of traffic data collection are both time-consuming and costly. However, the emergence of modern technologies, particularly Light Detection and Ranging (LiDAR), has revolutionized the process by enabling efficient and accurate data collection. Despite the benefits of using LiDAR for traffic data collection, previous studies have identified two major limitations that have impeded its widespread adoption. These are the need for multiple LiDAR systems to obtain complete point cloud information of objects of interest, as well as the labor-intensive process of annotating 3D bounding boxes for object detection tasks. In response to these challenges, the current study proposes an innovative framework that alleviates the need for multiple LiDAR systems and simplifies the laborious 3D annotation process. To achieve this goal, the study employed a single LiDAR system, that aims at reducing the data acquisition cost and addressed its accompanying limitation of missing point cloud information by developing a Point Cloud Completion (PCC) framework to fill in missing point cloud information using point density. Furthermore, we also used zero-shot learning techniques to detect vehicles and pedestrians, as well as proposed a unique framework for extracting low to high features from the object of interest, such as height, acceleration, and speed. Using the 2D bounding box detection and extracted height information, this study is able to generate 3D bounding boxes automatically without human intervention.
I see you: A Vehicle-Pedestrian Interaction Dataset from Traffic Surveillance Cameras
Quispe, Hanan, Sumire, Jorshinno, Condori, Patricia, Alvarez, Edwin, Vera, Harley
The development of autonomous vehicles arises new challenges in urban traffic scenarios where vehicle-pedestrian interactions are frequent e.g. vehicle yields to pedestrians, pedestrian slows down due approaching to the vehicle. Over the last years, several datasets have been developed to model these interactions. However, available datasets do not cover near-accident scenarios that our dataset covers. We introduce I see you, a new vehicle-pedestrian interaction dataset that tackles the lack of trajectory data in near-accident scenarios using YOLOv5 and camera calibration methods. I see you consist of 170 near-accident occurrences in seven intersections in Cusco-Peru. This new dataset and pipeline code are available on Github.
Active Learning at Scale -- Building a Robust Data Unification Framework
Nauto is a leading provider of advanced driver assistance systems that improve the safety of commercial fleets today and the autonomous vehicles of tomorrow. To that end, we process terabytes of driving data a month, collected by windshield-mounted devices from vehicles around the world. This data is used to continuously improve the models that power our vehicle safety stack, from the real-time predictive collision alerts deployed to our devices, to the safety analytics that run on the cloud. Beyond providing immediate safety value to the drivers, our features play the important role of shaping their own evolution. If we want to improve the vehicle detection powering Forward Collision Warning (FCW), the first place we will look is the false positives triggered by FCW.
A novel method of predictive collision risk area estimation for proactive pedestrian accident prevention system in urban surveillance infrastructure
Road traffic accidents, especially vehicle pedestrian collisions in crosswalk, globally pose a severe threat to human lives and have become a leading cause of premature deaths. In order to protect such vulnerable road users from collisions, it is necessary to recognize possible conflict in advance and warn to road users, not post facto. A breakthrough for proactively preventing pedestrian collisions is to recognize pedestrian's potential risks based on vision sensors such as CCTVs. In this study, we propose a predictive collision risk area estimation system at unsignalized crosswalks. The proposed system applied trajectories of vehicles and pedestrians from video footage after preprocessing, and then predicted their trajectories by using deep LSTM networks. With use of predicted trajectories, this system can infer collision risk areas statistically, further severity of levels is divided as danger, warning, and relative safe. In order to validate the feasibility and applicability of the proposed system, we applied it and assess the severity of potential risks in two unsignalized spots in Osan city, Korea.
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Sun, Pei, Kretzschmar, Henrik, Dotiwalla, Xerxes, Chouard, Aurelien, Patnaik, Vijaysai, Tsui, Paul, Guo, James, Zhou, Yin, Chai, Yuning, Caine, Benjamin, Vasudevan, Vijay, Han, Wei, Ngiam, Jiquan, Zhao, Hang, Timofeev, Aleksei, Ettinger, Scott, Krivokon, Maxim, Gao, Amy, Joshi, Aditya, Zhang, Yu, Shlens, Jonathon, Chen, Zhifeng, Anguelov, Dragomir
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.