Goto

Collaborating Authors

 toponym


A Novel End-To-End Event Geolocation Method Leveraging Hyperbolic Space and Toponym Hierarchies

Qiao, Yaqiong, Huang, Guojun

arXiv.org Artificial Intelligence

Abstract: Timely detection and geolocation of events based on social data can provide critical information for applications such as crisis response and resource allocation. However, most existing methods are greatly affected by event detection errors, leading to insufficient geolocation accuracy. To this end, this paper proposes a novel end-to-end event geolocation method (GTOP) leveraging Hyperbolic space and toponym hierarchies. Specifically, the proposed method contains one event detection module and one geolocation module. The event detection module constructs a heterogeneous information networks based on social data, and then constructs a homogeneous message graph and combines it with the text and time feature of the message to learning initial features of nodes. Node features are updated in Hyperbolic space and then fed into a classifier for event detection. To reduce the geolocation error, this paper proposes a noise toponym filtering algorithm (HIST) based on the hierarchical structure of toponyms. HIST analyzes the hierarchical structure of toponyms mentioned in the event cluster, taking the highly frequent city-level locations as the coarsegrained locations for events. To further improve the geolocation accuracy, we propose a fine-grained pseudo toponyms generation algorithm (FIT) based on the output of HIST, and combine generated pseudo toponyms with filtered toponyms to locate events based on the geographic center points of the combined toponyms. Extensive experiments are conducted on the Chinese dataset constructed in this paper and another public English dataset. The experimental results show that the proposed method is superior to the state-of-the-art baselines.


ArabicNLU 2024: The First Arabic Natural Language Understanding Shared Task

Khalilia, Mohammed, Malaysha, Sanad, Suwaileh, Reem, Jarrar, Mustafa, Aljabari, Alaa, Elsayed, Tamer, Zitouni, Imed

arXiv.org Artificial Intelligence

This paper presents an overview of the Arabic Natural Language Understanding (ArabicNLU 2024) shared task, focusing on two subtasks: Word Sense Disambiguation (WSD) and Location Mention Disambiguation (LMD). The task aimed to evaluate the ability of automated systems to resolve word ambiguity and identify locations mentioned in Arabic text. We provided participants with novel datasets, including a sense-annotated corpus for WSD, called SALMA with approximately 34k annotated tokens, and the IDRISI-DA dataset with 3,893 annotations and 763 unique location mentions. These are challenging tasks. Out of the 38 registered teams, only three teams participated in the final evaluation phase, with the highest accuracy being 77.8% for WSD and the highest MRR@1 being 95.0% for LMD. The shared task not only facilitated the evaluation and comparison of different techniques, but also provided valuable insights and resources for the continued advancement of Arabic NLU technologies.


Toward a Critical Toponymy Framework for Named Entity Recognition: A Case Study of Airbnb in New York City

Brunila, Mikael, LaViolette, Jack, CH-Wang, Sky, Verma, Priyanka, Féré, Clara, McKenzie, Grant

arXiv.org Artificial Intelligence

Critical toponymy examines the dynamics of power, capital, and resistance through place names and the sites to which they refer. Studies here have traditionally focused on the semantic content of toponyms and the top-down institutional processes that produce them. However, they have generally ignored the ways in which toponyms are used by ordinary people in everyday discourse, as well as the other strategies of geospatial description that accompany and contextualize toponymic reference. Here, we develop computational methods to measure how cultural and economic capital shape the ways in which people refer to places, through a novel annotated dataset of 47,440 New York City Airbnb listings from the 2010s. Building on this dataset, we introduce a new named entity recognition (NER) model able to identify important discourse categories integral to the characterization of place. Our findings point toward new directions for critical toponymy and to a range of previously understudied linguistic signals relevant to research on neighborhood status, housing and tourism markets, and gentrification.


Leave no Place Behind: Improved Geolocation in Humanitarian Documents

Belliardo, Enrico M., Kalimeri, Kyriaki, Mejova, Yelena

arXiv.org Artificial Intelligence

Geographical location is a crucial element of humanitarian response, outlining vulnerable populations, ongoing events, and available resources. Latest developments in Natural Language Processing may help in extracting vital information from the deluge of reports and documents produced by the humanitarian sector. However, the performance and biases of existing state-of-the-art information extraction tools are unknown. In this work, we develop annotated resources to fine-tune the popular Named Entity Recognition (NER) tools Spacy and roBERTa to perform geotagging of humanitarian texts. We then propose a geocoding method FeatureRank which links the candidate locations to the GeoNames database. We find that not only does the humanitarian-domain data improves the performance of the classifiers (up to F1 = 0.92), but it also alleviates some of the bias of the existing tools, which erroneously favor locations in the Western countries. Thus, we conclude that more resources from non-Western documents are necessary to ensure that off-the-shelf NER systems are suitable for the deployment in the humanitarian sector.


Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles

DeLozier, Grant (The University of Texas at Austin) | Baldridge, Jason (The University of Texas at Austin) | London, Loretta (The University of Texas at Austin)

AAAI Conferences

Toponym resolution, or grounding names of places to their actual locations, is an important problem in analysis of both historical corpora and present-day news and web content. Recent approaches have shifted from rule-based spatial minimization methods to machine learned classifiers that use features of the text surrounding a toponym. Such methods have been shown to be highly effective, but they crucially rely on gazetteers and are unable to handle unknown place names or locations. We address this limitation by modeling the geographic distributions of words over the earth's surface: we calculate the geographic profile of each word based on local spatial statistics over a set of geo-referenced language models. These geo-profiles can be further refined by combining in-domain data with background statistics from Wikipedia. Our resolver computes the overlap of all geo-profiles in a given text span; without using a gazetteer, it performs on par with existing classifiers. When combined with a gazetteer, it achieves state-of-the-art performance for two standard toponym resolution corpora (TR-CoNLL and Civil War). Furthermore, it dramatically improves recall when toponyms are identified by named entity recognizers, which often (correctly) find non-standard variants of toponyms.