Goto

Collaborating Authors

 Hassan, Bilal


Trajectory Prediction for Autonomous Driving: Progress, Limitations, and Future Directions

arXiv.org Artificial Intelligence

As the potential for autonomous vehicles to be integrated on a large scale into modern traffic systems continues to grow, ensuring safe navigation in dynamic environments is crucial for smooth integration. To guarantee safety and prevent collisions, autonomous vehicles must be capable of accurately predicting the trajectories of surrounding traffic agents. Over the past decade, significant efforts from both academia and industry have been dedicated to designing solutions for precise trajectory forecasting. These efforts have produced a diverse range of approaches, raising questions about the differences between these methods and whether trajectory prediction challenges have been fully addressed. This paper reviews a substantial portion of recent trajectory prediction methods and devises a taxonomy to classify existing solutions. A general overview of the prediction pipeline is also provided, covering input and output modalities, modeling features, and prediction paradigms discussed in the literature. In addition, the paper discusses active research areas within trajectory prediction, addresses the posed research questions, and highlights the remaining research gaps and challenges.


EMT: A Visual Multi-Task Benchmark Dataset for Autonomous Driving in the Arab Gulf Region

arXiv.org Artificial Intelligence

--This paper introduces the Emirates Multi-T ask (EMT) dataset - the first publicly available dataset for autonomous driving collected in the Arab Gulf region. It contains over 30,000 frames from a dash-camera perspective, along with 570,000 annotated bounding boxes, covering approximately 150 kilometers of driving routes. The EMT dataset supports three primary tasks: tracking, trajectory forecasting and intention prediction. Each benchmark dataset is complemented with corresponding evaluations: (1) multi-agent tracking experiments, focusing on multi-class scenarios and occlusion handling; (2) trajectory forecasting evaluation using deep sequential and interaction-aware models; and (3) intention benchmark experiments conducted for predicting agents' intentions from observed trajectories. The dataset is publicly available at avlab.io/emt-dataset, and pre-processing scripts along with evaluation models can be accessed at github.com/A S autonomous driving technology advances, the ability of data-driven models to generalize across diverse road environments and conditions is essential for safe operation, but remains a significant challenge. To achieve robust generalization, it is critical to train models on datasets that capture a wide range of traffic scenes and characteristics. Current autonomous driving datasets provide extensive coverage of regions like the USA [1-5], Europe [6, 7], and parts of Asia, including China and Singapore [1, 8]. However, the Arab Gulf region, with its unique driving conditions, remains underrepresented. To address this gap, we introduce the Emirates Multi-Task (EMT) dataset, collected in the United Arab Emirates (UAE) to capture the region's distinct traffic conditions. This region offers diverse driving challenges due to its range of road layouts, including expansive highways, urban areas, and complex city junctions. Additionally, driving behavior in the UAE reflects a blend of modern regulations and traditional practices. This work was supported by Khalifa University of Science and Technology under A ward No. RIG-2023-117. The annotated dataset supports multiple benchmarks, including tracking, trajectory prediction, and intention prediction, aimed at advancing models robustness in complex driving environments. The tracking benchmark dataset is designed to evaluate the ability of algorithms to accurately identify and maintain consistent object tracking over time in a complex driving environment. Similar to current state-of-the-art (SOT A) tracking benchmarks [1, 9, 10], it focuses on the motion of vehicles, pedestrians, cyclists, and motorbikes, captured from a frontal camera perspective. The benchmark is designed to test tracking models under varying levels of traffic congestion and frequent lane changes. The dataset contains 8,806 unique tracking IDs, including 8,076 vehicles, 568 pedestrians, 158 motorbikes and 14 cyclists, and with a mean tracking duration of 6.5 seconds.


RobMOT: Robust 3D Multi-Object Tracking by Observational Noise and State Estimation Drift Mitigation on LiDAR PointCloud

arXiv.org Artificial Intelligence

This work addresses limitations in recent 3D tracking-by-detection methods, focusing on identifying legitimate trajectories and addressing state estimation drift in Kalman filters. Current methods rely heavily on threshold-based filtering of false positive detections using detection scores to prevent ghost trajectories. However, this approach is inadequate for distant and partially occluded objects, where detection scores tend to drop, potentially leading to false positives exceeding the threshold. Additionally, the literature generally treats detections as precise localizations of objects. Our research reveals that noise in detections impacts localization information, causing trajectory drift for occluded objects and hindering recovery. To this end, we propose a novel online track validity mechanism that temporally distinguishes between legitimate and ghost tracks, along with a multi-stage observational gating process for incoming observations. This mechanism significantly improves tracking performance, with a $6.28\%$ in HOTA and a $17.87\%$ increase in MOTA. We also introduce a refinement to the Kalman filter that enhances noise mitigation in trajectory drift, leading to more robust state estimation for occluded objects. Our framework, RobMOT, outperforms state-of-the-art methods, including deep learning approaches, across various detectors, achieving up to a $4\%$ margin in HOTA and $6\%$ in MOTA. RobMOT excels under challenging conditions, such as prolonged occlusions and tracking distant objects, with up to a 59\% improvement in processing latency.


Media Forensics and Deepfake Systematic Survey

arXiv.org Artificial Intelligence

Deepfake is a generative deep learning algorithm that creates or changes facial features in a very realistic way making it hard to differentiate the real from the fake features It can be used to make movies look better as well as to spread false information by imitating famous people In this paper many different ways to make a Deepfake are explained analyzed and separated categorically Using Deepfake datasets models are trained and tested for reliability through experiments Deepfakes are a type of facial manipulation that allow people to change their entire faces identities attributes and expressions The trends in the available Deepfake datasets are also discussed with a focus on how they have changed Using Deep learning a general Deepfake detection model is made Moreover the problems in making and detecting Deepfakes are also mentioned As a result of this survey it is expected that the development of new Deepfake based imaging tools will speed up in the future This survey gives indepth review of methods for manipulating images of face and various techniques to spot altered face images Four types of facial manipulation are specifically discussed which are attribute manipulation expression swap entire face synthesis and identity swap Across every manipulation category we yield information on manipulation techniques significant benchmarks for technical evaluation of counterfeit detection techniques available public databases and a summary of the outcomes of all such analyses From all of the topics in the survey we focus on the most recent development of Deepfake showing its advances and obstacles in detecting fake images