Novel Benchmark for NER in the Wastewater and Stormwater Domain
Cardillo, Franco Alberto, Debole, Franca, Frontini, Francesca, Aelami, Mitra, Chahinian, Nanée, Conrad, Serge
–arXiv.org Artificial Intelligence
The effective management of wastewater and stormwater systems is crucial for urban sustainability and environmental protection. These systems, which form an integral part of public infrastructure, require structured information for monitoring, planning, and maintenance. However, much of the relevant information exists in unstructured textual formats, such as technical reports, regulatory documents, and maintenance logs. Extracting information from these sources is a key challenge, due to domain-specific terminology and the multilingual nature of regulatory and operational contexts. Typically a wastewater management information extraction application will require domain-specific entity recognition, followed by the extraction of relations between entities to support decision-making, automated reasoning, and linking to existing knowledge bases. The recent progresses in domain-specific Named Entity Recognition (NER) have the potential to greatly facilitate the development of such applications. However, to effectively evaluate this first and crucial step of the extraction pipeline, it is essential to establish a clearly defined set of extractable entities and construct a multilingual benchmark corpus . Building on previous work - carried out within the framework of a national project on just one language - we propose the following contributions: The starwars corpus, an aligned French-Italian corpus containing domain-specific texts.
arXiv.org Artificial Intelligence
Jun-3-2025
- Country:
- Europe
- France > Occitanie
- Hérault > Montpellier (0.05)
- Italy
- Piedmont > Turin Province
- Turin (0.04)
- Tuscany > Pisa Province
- Pisa (0.05)
- Piedmont > Turin Province
- France > Occitanie
- Europe
- Genre:
- Research Report (0.50)
- Industry:
- Water & Waste Management > Water Management (1.00)
- Technology: