Visual Reasoning at Urban Intersections: FineTuning GPT-4o for Traffic Conflict Detection

Masri, Sari, Ashqar, Huthaifa I., Elhenawy, Mohammed

arXiv.org Artificial Intelligence 

-- Traffic control in unsignalized urban intersections presents significant challenges due to the complexity, frequent conflicts, and blind spots. This study explores the capability of leveraging Multimodal L arge L anguage M odel s (MLLMs), such as GPT - 4o, to provide logical and visual reasoning by directly using birds - eye - view videos of four - legged intersections. In this proposed method, GPT - 4o act s as intelligent system to detect conflicts and provide explanations and recommendations for the drivers . The fine - tuned model achieved an accuracy of 77.14%, while the manual evaluation of the true predicted values of the fine - tuned GPT - 4o showed significant achievements of 89.9% accuracy for model - generated explanations and 92.3% for the recommended next a ctions. Urban intersections are highly challenging due to their unpredictability and dynamism, especially in cases of unsignalized intersections. Interactions often occur among motor vehicles and other road users in such areas.