Subnational Geocoding of Global Disasters Using Large Language Models

Ronco, Michele, Delforge, Damien, Jäger, Wiebke S., Corbane, Christina

Nov-20-2025–arXiv.org Artificial Intelligence

Subnational location data of disaster events are critical for risk assessment and disaster risk reduction. Disaster databases such as EM-DAT often report locations in unstructured textual form, with inconsistent granularity or spelling, that make it difficult to integrate with spatial datasets. We present a fully automated LLM-assisted workflow that processes and cleans textual location information using GPT-4o, and assigns geometries by cross-checking three independent geoinformation repositories: GADM, OpenStreetMap and Wikidata. Based on the agreement and availability of these sources, we assign a reliability score to each location while generating subnational geometries. Applied to the EM-DAT dataset from 2000 to 2024, the workflow geocodes 14,215 events across 17,948 unique locations. Unlike previous methods, our approach requires no manual intervention, covers all disaster types, enables cross-verification across multiple sources, and allows flexible remapping to preferred frameworks. Beyond the dataset, we demonstrate the potential of LLMs to extract and structure geographic information from unstructured text, offering a scalable and reliable method for related analyses.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Nov-20-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.46)
- Asia > India (0.28)

Genre:
- Workflow (0.88)
- Research Report (0.64)

Industry:
- Information Technology (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found