ALIGN: A Vision-Language Framework for High-Accuracy Accident Location Inference through Geo-Spatial Neural Reasoning

Chowdhury, MD Thamed Bin Zaman, Hossain, Moazzem

arXiv.org Artificial Intelligence 

ABSTRACT Reliable geospatial information on road accidents is vital for safety analysis and infrastructure planning, yet most low-and middle-income countries continue to face a critical shortage of accurate, location-specific crash data. Existing text-based geocoding tools perform poorly in multilingual and unstructured news environments, where incomplete place descriptions and mixed language (e.g. To address these limitations, this study introduces ALIGN (Accident Location Inference through Geo-Spatial Neural Reasoning) -- a vision-language framework that emulates human spatial reasoning to infer accident location coordinates directly from available textual and map-based cues. ALIGN integrates large language and vision-language model mechanisms within a multi-stage pipeline that performs optical character recognition, linguistic reasoning, and map-level verification through grid-based spatial scanning. The framework systematically evaluates each predicted location against contextual and visual evidence, ensuring interpretable, fine-grained geolocation outcomes without requiring model retraining. Applied to Bangla-language news data source, ALIGN demonstrates consistent improvements over traditional geoparsing methods, accurately identifying district-and sub-district-level crash sites. Beyond its technical contribution, the framework establishes a high accuracy foundation for automated crash mapping in data-scarce regions, supporting evidence-driven road-safety policymaking and the broader integration of multimodal artificial intelligence in transportation analytics. Hossain) 1. Introduction Accurate, fine-grained geospatial data is the bedrock of effective public safety policy, urban planning, and strategic response. For road safety, knowing the precise location of traffic crashes is essential for diagnosing high-risk black spots, deploying emergency services, and evaluating the impact of engineering interventions. While high-income nations increasingly rely on robust, integrated crash databases and vehicle telematics (Guo, Qian, & Shi, 2022; Szpytko & Nasan Agha, 2020), utilizing advanced methods such as deep learning on multi-vehicle trajectories (Yang et al., 2021), ensemble models integrating connected vehicle data (Yang et al., 2026), and 2 probe vehicle speed contour analysis (Wang et al., 2021), a significant'geospatial data desert' persists in most Low-and Middle-Income Countries (LMICs) (Mitra & Bhalla, 2023; Chang et al., 2020). This gap is particularly tragic given that these regions bear the overwhelming brunt of global road traffic fatalities. This research focuses on a low-resource country-Bangladesh, a nation that exemplifies this critical data-sparse challenge. The World Bank has estimated that the costs associated with traffic crashes can amount to as much as 5.1% of the country's Gross Domestic Product (World Bank, 2022).