Goto

Collaborating Authors

 place name


ALIGN: A Vision-Language Framework for High-Accuracy Accident Location Inference through Geo-Spatial Neural Reasoning

Chowdhury, MD Thamed Bin Zaman, Hossain, Moazzem

arXiv.org Artificial Intelligence

ABSTRACT Reliable geospatial information on road accidents is vital for safety analysis and infrastructure planning, yet most low-and middle-income countries continue to face a critical shortage of accurate, location-specific crash data. Existing text-based geocoding tools perform poorly in multilingual and unstructured news environments, where incomplete place descriptions and mixed language (e.g. To address these limitations, this study introduces ALIGN (Accident Location Inference through Geo-Spatial Neural Reasoning) -- a vision-language framework that emulates human spatial reasoning to infer accident location coordinates directly from available textual and map-based cues. ALIGN integrates large language and vision-language model mechanisms within a multi-stage pipeline that performs optical character recognition, linguistic reasoning, and map-level verification through grid-based spatial scanning. The framework systematically evaluates each predicted location against contextual and visual evidence, ensuring interpretable, fine-grained geolocation outcomes without requiring model retraining. Applied to Bangla-language news data source, ALIGN demonstrates consistent improvements over traditional geoparsing methods, accurately identifying district-and sub-district-level crash sites. Beyond its technical contribution, the framework establishes a high accuracy foundation for automated crash mapping in data-scarce regions, supporting evidence-driven road-safety policymaking and the broader integration of multimodal artificial intelligence in transportation analytics. Hossain) 1. Introduction Accurate, fine-grained geospatial data is the bedrock of effective public safety policy, urban planning, and strategic response. For road safety, knowing the precise location of traffic crashes is essential for diagnosing high-risk black spots, deploying emergency services, and evaluating the impact of engineering interventions. While high-income nations increasingly rely on robust, integrated crash databases and vehicle telematics (Guo, Qian, & Shi, 2022; Szpytko & Nasan Agha, 2020), utilizing advanced methods such as deep learning on multi-vehicle trajectories (Yang et al., 2021), ensemble models integrating connected vehicle data (Yang et al., 2026), and 2 probe vehicle speed contour analysis (Wang et al., 2021), a significant'geospatial data desert' persists in most Low-and Middle-Income Countries (LMICs) (Mitra & Bhalla, 2023; Chang et al., 2020). This gap is particularly tragic given that these regions bear the overwhelming brunt of global road traffic fatalities. This research focuses on a low-resource country-Bangladesh, a nation that exemplifies this critical data-sparse challenge. The World Bank has estimated that the costs associated with traffic crashes can amount to as much as 5.1% of the country's Gross Domestic Product (World Bank, 2022).


PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration

Wang, Xin, Cui, Zhiyao, Li, Hao, Zeng, Ya, Wang, Chenxu, Song, Ruiqi, Chen, Yihang, Shao, Kun, Zhang, Qiaosheng, Liu, Jinzhuo, Ren, Siyue, Hu, Shuyue, Wang, Zhen

arXiv.org Artificial Intelligence

Vision language model (VLM)-based mobile agents show great potential for assisting users in performing instruction-driven tasks. However, these agents typically struggle with personalized instructions -- those containing ambiguous, user-specific context -- a challenge that has been largely overlooked in previous research. In this paper, we define personalized instructions and introduce PerInstruct, a novel human-annotated dataset covering diverse personalized instructions across various mobile scenarios. Furthermore, given the limited personalization capabilities of existing mobile agents, we propose PerPilot, a plug-and-play framework powered by large language models (LLMs) that enables mobile agents to autonomously perceive, understand, and execute personalized user instructions. PerPilot identifies personalized elements and autonomously completes instructions via two complementary approaches: memory-based retrieval and reasoning-based exploration. Experimental results demonstrate that PerPilot effectively handles personalized tasks with minimal user intervention and progressively improves its performance with continued use, underscoring the importance of personalization-aware reasoning for next-generation mobile agents. The dataset and code are available at: https://github.com/xinwang-nwpu/PerPilot


Large Multi-modal Model Cartographic Map Comprehension for Textual Locality Georeferencing

Wijegunarathna, Kalana, Stock, Kristin, Jones, Christopher B.

arXiv.org Artificial Intelligence

Millions of biological sample records collected in the last few centuries archived in natural history collections are un-georeferenced. Georeferencing complex locality descriptions associated with these collection samples is a highly labour-intensive task collection agencies struggle with. None of the existing automated methods exploit maps that are an essential tool for georeferencing complex relations. We present preliminary experiments and results of a novel method that exploits multi-modal capabilities of recent Large Multi-Modal Models (LMM). This method enables the model to visually contextualize spatial relations it reads in the locality description. We use a grid-based approach to adapt these auto-regressive models for this task in a zero-shot setting. Our experiments conducted on a small manually annotated dataset show impressive results for our approach ($\sim$1 km Average distance error) compared to uni-modal georeferencing with Large Language Models and existing georeferencing tools. The paper also discusses the findings of the experiments in light of an LMM's ability to comprehend fine-grained maps. Motivated by these results, a practical framework is proposed to integrate this method into a georeferencing workflow.


DistRAG: Towards Distance-Based Spatial Reasoning in LLMs

Schneider, Nicole R, Ramachandran, Nandini, O'Sullivan, Kent, Samet, Hanan

arXiv.org Artificial Intelligence

Many real world tasks where Large Language Models (LLMs) can be used require spatial reasoning, like Point of Interest (POI) recommendation and itinerary planning. However, on their own LLMs lack reliable spatial reasoning capabilities, especially about distances. To address this problem, we develop a novel approach, DistRAG, that enables an LLM to retrieve relevant spatial information not explicitly learned during training. Our method encodes the geodesic distances between cities and towns in a graph and retrieves a context subgraph relevant to the question. Using this technique, our method enables an LLM to answer distance-based reasoning questions that it otherwise cannot answer. Given the vast array of possible places an LLM could be asked about, DistRAG offers a flexible first step towards providing a rudimentary `world model' to complement the linguistic knowledge held in LLMs.


Killing Two Flies with One Stone: An Attempt to Break LLMs Using English->Icelandic Idioms and Proper Names

Ármannsson, Bjarki, Hafsteinsson, Hinrik, Jasonarson, Atli, Steingrímsson, Steinþór

arXiv.org Artificial Intelligence

This paper presents the submission of the \'Arni Magn\'usson Institute's team to the WMT24 test suite subtask, focusing on idiomatic expressions and proper names for the English->Icelandic translation direction. Intuitively and empirically, idioms and proper names are known to be a significant challenge for modern translation models. We create two different test suites. The first evaluates the competency of MT systems in translating common English idiomatic expressions, as well as testing whether systems can distinguish between those expressions and the same phrases when used in a literal context. The second test suite consists of place names that should be translated into their Icelandic exonyms (and correctly inflected) and pairs of Icelandic names that share a surface form between the male and female variants, so that incorrect translations impact meaning as well as readability. The scores reported are relatively low, especially for idiomatic expressions and place names, and indicate considerable room for improvement.


Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction

Haris, Erum, Cohn, Anthony G., Stell, John G.

arXiv.org Artificial Intelligence

Navigating historical narratives poses a challenge in unveiling the spatial intricacies of past landscapes. The proposed work addresses this challenge within the context of the English Lake District, employing the Corpus of the Lake District Writing. The method utilizes a generative pre-trained transformer model to extract spatial relations from the textual descriptions in the corpus. The study applies this large language model to understand the spatial dimensions inherent in historical narratives comprehensively. The outcomes are presented as semantic triples, capturing the nuanced connections between entities and locations, and visualized as a network, offering a graphical representation of the spatial narrative. The study contributes to a deeper comprehension of the English Lake District's spatial tapestry and provides an approach to uncovering spatial relations within diverse historical contexts.


GLOCON Database: Design Decisions and User Manual (v1.0)

Hürriyetoğlu, Ali, Mutlu, Osman, Duruşan, Fırat, Yörük, Erdem

arXiv.org Artificial Intelligence

GLOCON is a database of contentious events automatically extracted from national news sources from various countries in multiple languages. National news sources are utilized, and complete news archives are processed to create an event list for each source. Automation is achieved using a gold standard corpus sampled randomly from complete news archives (Yörük et al. 2022) and all annotated by at least two domain experts based on the event definition provided in Duruşan et al. (2022). The database consists of the following countries and sources provided in Table 1 as of May 2024.


How English is YOUR hometown? Scientists reveal the place names that are the most 'archetypically English' - so, is yours on the list?

Daily Mail - Science & tech

England is famous for its eccentric place names, from'Matching Tye' to'Fingringhoe' and'Upton Snodsbury'. But a new AI study now reveals the most English-sounding locations in the country – and they certainly conjure up images of cricket and afternoon tea. The study shows that'Harlington', a district of London, is the most archetypal English place name, along with'Widdington' in Essex and'Colworth' in West Sussex. It contrast, 'Anna', a settlement in Hampshire, is the least English-sounding, along with'Belgravia' in London and'Moira' in Leicestershire. Although AI was used to determine the language basis of English place names, not the meaning, the results could reveal more about the history of the locations.


AI sheds light on the ancient origins of England's place names

New Scientist

Places with "ton" in their name, such as Shitterton in Dorset, are said to be the most archetypically English Attempting to uncover the ancient origins of English place names has long kept academics and hobbyists busy, with each name often attracting multiple conflicting ideas. Now, artificial intelligence offers a way to analyse the names' similarities to a host of other European languages, revealing hints about their roots.


A Stochastic Analysis of the Linguistic Provenance of English Place Names

Dalvean, Michael

arXiv.org Artificial Intelligence

In English place name analysis, meanings are often derived from the resemblance of roots in place names to topographical features, proper names and/or habitation terms in one of the languages that have had an influence on English place names. The problem here is that it is sometimes difficult to determine the base language to use to interpret the roots. The purpose of this paper is to stochastically determine the resemblance between 18799 English place names and 84687 place names from Ireland, Scotland, Wales, Denmark, Norway, Sweden, France, Germany, the Netherlands and Ancient Rome. Each English place name is ranked according to the extent to which it resembles place names from the other countries, and this provides a basis for determining the likely language to use to interpret the place name. A number of observations can be made using the ranking provided. In particular, it is found that `Harlington' is the most archetypically English place name in the English sample, and `Anna' is the least. Furthermore, it is found that the place names in the non-English datasets are most similar to Norwegian place names and least similar to Welsh place names.