Leave no Place Behind: Improved Geolocation in Humanitarian Documents
Belliardo, Enrico M., Kalimeri, Kyriaki, Mejova, Yelena
–arXiv.org Artificial Intelligence
Geographical location is a crucial element of humanitarian response, outlining vulnerable populations, ongoing events, and available resources. Latest developments in Natural Language Processing may help in extracting vital information from the deluge of reports and documents produced by the humanitarian sector. However, the performance and biases of existing state-of-the-art information extraction tools are unknown. In this work, we develop annotated resources to fine-tune the popular Named Entity Recognition (NER) tools Spacy and roBERTa to perform geotagging of humanitarian texts. We then propose a geocoding method FeatureRank which links the candidate locations to the GeoNames database. We find that not only does the humanitarian-domain data improves the performance of the classifiers (up to F1 = 0.92), but it also alleviates some of the bias of the existing tools, which erroneously favor locations in the Western countries. Thus, we conclude that more resources from non-Western documents are necessary to ensure that off-the-shelf NER systems are suitable for the deployment in the humanitarian sector.
arXiv.org Artificial Intelligence
Sep-6-2023
- Country:
- Atlantic Ocean > Mediterranean Sea (0.04)
- Oceania > Australia (0.04)
- South America
- North America
- Panama (0.04)
- Trinidad and Tobago (0.04)
- The Bahamas (0.04)
- Mexico (0.04)
- United States > New York
- New York County > New York City (0.04)
- Europe
- Spain > Canary Islands
- Gran Canaria > Las Palmas de Gran Canaria (0.04)
- Portugal > Lisbon
- Lisbon (0.05)
- Italy > Piedmont
- Turin Province > Turin (0.04)
- Spain > Canary Islands
- Asia
- Afghanistan (0.04)
- South Korea (0.04)
- North Korea (0.04)
- Myanmar (0.04)
- Indonesia (0.04)
- Middle East
- Syria (0.14)
- Yemen > Al Hudaydah Governorate
- Al Hudaydah (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Iraq > Baghdad Governorate
- Baghdad (0.04)
- Africa
- Sudan (0.04)
- Niger (0.04)
- Mozambique (0.04)
- Democratic Republic of the Congo (0.04)
- Nigeria > Kaduna State
- Kaduna (0.04)
- Middle East
- Chad > Salamat
- Am Timan (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine (0.94)
- Technology: