Goto

Collaborating Authors

 intelligent transport


Robustness and Adaptability of Reinforcement Learning based Cooperative Autonomous Driving in Mixed-autonomy Traffic

Valiente, Rodolfo, Toghi, Behrad, Pedarsani, Ramtin, Fallah, Yaser P.

arXiv.org Artificial Intelligence

HE development of autonomous vehicles (A Vs) is on the verge of passing beyond the laboratory and simulation tests and is shifting towards addressing the challenges that limit their practicality in today's society. While there is still need for further technological improvements to enable safe and smooth operation of a single A V, a great deal of research attention is being focused on the emerging challenge of operating multiple A Vs and the co-existence of A Vs and human-driven vehicles (HVs) [1], [2]. A realistic outlook for the adoption of autonomous vehicles on the roads is a mixed-traffic scenario in which human drivers with different driving styles and social preferences share the road with A Vs that are perhaps built by different manufacturers and hence follow different policies [3], [4]. In this work, we seek a solution that can ensure the safety and robustness of A Vs in the presence of human drivers with heterogeneous behavioral traits. Connected & autonomous vehicles (CA Vs) via vehicle-to-vehicle (V2V) communication allow vehicles to directly communicate with their neighbors, creating an extended perception that enables explicit coordination among vehicles to overcome the limitations of an isolated agent [5]-[11]. While planning in a fully A V scenario is relatively easy to achieve, coordination in the presence of HVs is a significantly more challenging task, as the A Vs not only need to react to road objects but also need to consider the behaviors of HVs [3], [4], [12]. We start by identifying the major challenges in the domain of behavior planning and prediction for A Vs in mixed-autonomy traffic.


Controllable risk scenario generation from human crash data for autonomous vehicle testing

Lu, Qiujing, Wang, Xuanhan, Yuan, Runze, Lu, Wei, Gong, Xinyi, Feng, Shuo

arXiv.org Artificial Intelligence

Ensuring the safety of autonomous vehicles (AV) requires rigorous testing under both everyday driving and rare, safety-critical conditions. A key challenge lies in simulating environment agents, including background vehicles (BVs) and vulnerable road users (VRUs), that behave realistically in nominal traffic while also exhibiting risk-prone behaviors consistent with real-world accidents. We introduce Controllable Risk Agent Generation (CRAG), a framework designed to unify the modeling of dominant nominal behaviors and rare safety-critical behaviors. CRAG constructs a structured latent space that disentangles normal and risk-related behaviors, enabling efficient use of limited crash data. By combining risk-aware latent representations with optimization-based mode-transition mechanisms, the framework allows agents to shift smoothly and plausibly from safe to risk states over extended horizons, while maintaining high fidelity in both regimes. Extensive experiments show that CRAG improves diversity compared to existing baselines, while also enabling controllable generation of risk scenarios for targeted and efficient evaluation of AV robustness.


Intelligent Optimization of Wireless Access Point Deployment for Communication-Based Train Control Systems Using Deep Reinforcement Learning

Wu, Kunyu, Zhao, Qiushi, Feng, Zihan, Mu, Yunxi, Qin, Hao, Zhang, Xinyu, Zhang, Xingqi

arXiv.org Artificial Intelligence

Urban railway systems increasingly rely on communication based train control (CBTC) systems, where optimal deployment of access points (APs) in tunnels is critical for robust wireless coverage. Traditional methods, such as empirical model-based optimization algorithms, are hindered by excessive measurement requirements and suboptimal solutions, while machine learning (ML) approaches often struggle with complex tunnel environments. This paper proposes a deep reinforcement learning (DRL) driven framework that integrates parabolic wave equation (PWE) channel modeling, conditional generative adversarial network (cGAN) based data augmentation, and a dueling deep Q network (Dueling DQN) for AP placement optimization. The PWE method generates high-fidelity path loss distributions for a subset of AP positions, which are then expanded by the cGAN to create high resolution path loss maps for all candidate positions, significantly reducing simulation costs while maintaining physical accuracy. In the DRL framework, the state space captures AP positions and coverage, the action space defines AP adjustments, and the reward function encourages signal improvement while penalizing deployment costs. The dueling DQN enhances convergence speed and exploration exploitation balance, increasing the likelihood of reaching optimal configurations. Comparative experiments show that the proposed method outperforms a conventional Hooke Jeeves optimizer and traditional DQN, delivering AP configurations with higher average received power, better worst-case coverage, and improved computational efficiency. This work integrates high-fidelity electromagnetic simulation, generative modeling, and AI-driven optimization, offering a scalable and data-efficient solution for next-generation CBTC systems in complex tunnel environments.


MAIFormer: Multi-Agent Inverted Transformer for Flight Trajectory Prediction

Yoon, Seokbin, Lee, Keumjin

arXiv.org Artificial Intelligence

Flight trajectory prediction for multiple aircraft is essential and provides critical insights into how aircraft navigate within current air traffic flows. However, predicting multi-agent flight trajectories is inherently challenging. One of the major difficulties is modeling both the individual aircraft behaviors over time and the complex interactions between flights. Generating explainable prediction outcomes is also a challenge. Therefore, we propose a Multi-Agent Inverted Transformer, MAIFormer, as a novel neural architecture that predicts multi-agent flight trajectories. The proposed framework features two key attention modules: (i) masked multivariate attention, which captures spatio-temporal patterns of individual aircraft, and (ii) agent attention, which models the social patterns among multiple agents in complex air traffic scenes. We evaluated MAIFormer using a real-world automatic dependent surveillance-broadcast flight trajectory dataset from the terminal airspace of Incheon International Airport in South Korea. The experimental results show that MAIFormer achieves the best performance across multiple metrics and outperforms other methods. In addition, MAIFormer produces prediction outcomes that are interpretable from a human perspective, which improves both the transparency of the model and its practical utility in air traffic control.


Enhancing Traffic Incident Response through Sub-Second Temporal Localization with HybridMamba

Shihab, Ibne Farabi, Akter, Sanjeda, Sharma, Anuj

arXiv.org Artificial Intelligence

Traffic crash detection in long-form surveillance videos is essential for improving emergency response and infrastructure planning, yet remains difficult due to the brief and infrequent nature of crash events. We present \textbf{HybridMamba}, a novel architecture integrating visual transformers with state-space temporal modeling to achieve high-precision crash time localization. Our approach introduces multi-level token compression and hierarchical temporal processing to maintain computational efficiency without sacrificing temporal resolution. Evaluated on a large-scale dataset from the Iowa Department of Transportation, HybridMamba achieves a mean absolute error of \textbf{1.50 seconds} for 2-minute videos ($p<0.01$ compared to baselines), with \textbf{65.2%} of predictions falling within one second of the ground truth. It outperforms recent video-language models (e.g., TimeChat, VideoLLaMA-2) by up to 3.95 seconds while using significantly fewer parameters (3B vs. 13--72B). Our results demonstrate effective temporal localization across various video durations (2--40 minutes) and diverse environmental conditions, highlighting HybridMamba's potential for fine-grained temporal localization in traffic surveillance while identifying challenges that remain for extended deployment.


Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges

Akter, Sanjeda, Shihab, Ibne Farabi, Sharma, Anuj

arXiv.org Artificial Intelligence

Crash detection from video feeds is a critical problem in intelligent transportation systems. Recent developments in large language models (LLMs) and vision-language models (VLMs) have transformed how we process, reason about, and summarize multimodal information. This paper surveys recent methods leveraging LLMs for crash detection from video data. We present a structured taxonomy of fusion strategies, summarize key datasets, analyze model architectures, compare performance benchmarks, and discuss ongoing challenges and opportunities. Our review provides a foundation for future research in this fast-growing intersection of video understanding and foundation models.


Image Segmentation with Large Language Models: A Survey with Perspectives for Intelligent Transportation Systems

Akter, Sanjeda, Shihab, Ibne Farabi, Sharma, Anuj

arXiv.org Artificial Intelligence

The integration of Large Language Models (LLMs) with computer vision is profoundly transforming perception tasks like image segmentation. For intelligent transportation systems (ITS), where accurate scene understanding is critical for safety and efficiency, this new paradigm offers unprecedented capabilities. This survey systematically reviews the emerging field of LLM-augmented image segmentation, focusing on its applications, challenges, and future directions within ITS. We provide a taxonomy of current approaches based on their prompting mechanisms and core architectures, and we highlight how these innovations can enhance road scene understanding for autonomous driving, traffic monitoring, and infrastructure maintenance. Finally, we identify key challenges, including real-time performance and safety-critical reliability, and outline a perspective centered on explainable, human-centric AI as a prerequisite for the successful deployment of this technology in next-generation transportation systems.


Query as Test: An Intelligent Driving Test and Data Storage Method for Integrated Cockpit-Vehicle-Road Scenarios

Yao, Shengyue, Guo, Runqing, Qin, Yangyang, Meng, Miangbing, Cao, Jipeng, Lin, Yilun, Lv, Yisheng, Wang, Fei-Yue

arXiv.org Artificial Intelligence

With the deep penetration of Artificial Intelligence (AI) in the transportation sector, intelligent cockpits, autonomous driving, and intelligent road networks are developing at an unprecedented pace. However, the data ecosystems of these three key areas are increasingly fragmented and incompatible. Especially, existing testing methods rely on data stacking, fail to cover all edge cases, and lack flexibility. To address this issue, this paper introduces the concept of "Query as Test" (QaT). This concept shifts the focus from rigid, prescripted test cases to flexible, on-demand logical queries against a unified data representation. Specifically, we identify the need for a fundamental improvement in data storage and representation, leading to our proposal of "Extensible Scenarios Notations" (ESN). ESN is a novel declarative data framework based on Answer Set Programming (ASP), which uniformly represents heterogeneous multimodal data from the cockpit, vehicle, and road as a collection of logical facts and rules. This approach not only achieves deep semantic fusion of data, but also brings three core advantages: (1) supports complex and flexible semantic querying through logical reasoning; (2) provides natural interpretability for decision-making processes; (3) allows for on-demand data abstraction through logical rules, enabling fine-grained privacy protection. We further elaborate on the QaT paradigm, transforming the functional validation and safety compliance checks of autonomous driving systems into logical queries against the ESN database, significantly enhancing the expressiveness and formal rigor of the testing. Finally, we introduce the concept of "Validation-Driven Development" (VDD), which suggests to guide developments by logical validation rather than quantitative testing in the era of Large Language Models, in order to accelerating the iteration and development process.


PPTNet: A Hybrid Periodic Pattern-Transformer Architecture for Traffic Flow Prediction and Congestion Identification

Kou, Hongrui, Li, Jingkai, Wang, Ziyu, Lv, Zhouhang, Zhang, Yuxin, Wang, Cheng

arXiv.org Artificial Intelligence

--Accurate prediction of traffic flow parameters and real-time identification of congestion states are essential for the efficient operation of intelligent transportation systems. This paper proposes a Periodic Pattern-Transformer Network (PPTNet) for traffic flow prediction, integrating periodic pattern extraction with the Transformer architecture, coupled with a fuzzy inference method for real-time congestion identification. Firstly, a high-precision traffic flow dataset (Traffic Flow Dataset for China's Congested Highways & Expressways, TF4CHE) suitable for congested highway scenarios in China is constructed based on drone aerial imagery data. Subsequently, the proposed PPTNet employs Fast Fourier Transform to capture multi-scale periodic patterns and utilizes two-dimensional Inception convolutions to efficiently extract intra and inter periodic features. Finally, congestion probabilities are calculated in real-time using the predicted outcomes via a Mamdani fuzzy inference-based congestion identification module. Experimental results demonstrate that the proposed PPTNet significantly outperforms mainstream traffic prediction methods in prediction accuracy, and the congestion identification module effectively identifies real-time road congestion states, verifying the superiority and practicality of the proposed method in real-world traffic scenarios. ITH the rapid advancement of Intelligent Transportation Systems (ITS), traffic flow prediction has become a core technology to optimize traffic management and improve operational efficiency [1]. As a critical component of national transportation infrastructure, expressways are particularly susceptible to traffic congestion, which not only directly reduces throughput but also indirectly contributes to a higher incidence of traffic accidents.


Decision Making in Urban Traffic: A Game Theoretic Approach for Autonomous Vehicles Adhering to Traffic Rules

Shu, Keqi, Ning, Minghao, Alghooneh, Ahmad, Li, Shen, Pirani, Mohammad, Khajepour, Amir

arXiv.org Artificial Intelligence

One of the primary challenges in urban autonomous vehicle decision-making and planning lies in effectively managing intricate interactions with diverse traffic participants characterized by unpredictable movement patterns. Additionally, interpreting and adhering to traffic regulations within rapidly evolving traffic scenarios pose significant hurdles. This paper proposed a rule-based autonomous vehicle decision-making and planning framework which extracts right-of-way from traffic rules to generate behavioural parameters, integrating them to effectively adhere to and navigate through traffic regulations. The framework considers the strong interaction between traffic participants mathematically by formulating the decision-making and planning problem into a differential game. By finding the Nash equilibrium of the problem, the autonomous vehicle is able to find optimal decisions. The proposed framework was tested under simulation as well as full-size vehicle platform, the results show that the ego vehicle is able to safely interact with surrounding traffic participants while adhering to traffic rules.