qualifier
Seven games, 20 goals, none conceded - England close in on perfection
England have cruised through World Cup qualifying, winning all seven of their games and scoring 20 unanswered goals, setting several records and leaving them on the cusp of another. If Thomas Tuchel's team beat Albania in Sunday's final qualifier (17:00 GMT) and keep a clean sheet, they will become the first European side to play at least six qualifiers and win them all without conceding. A clean sweep of victories - regardless of goals conceded - is also a rare achievement. Excluding the early years of the World Cup, when teams often played just a handful of preliminary matches, only four European countries have finished with a 100% winning record. Germany were the last side to do so on the way to the 2018 tournament, though they went on to suffer a shock early exit in Russia.
- Europe > United Kingdom > England (0.75)
- Europe > Albania (0.26)
- Europe > Germany (0.25)
- (16 more...)
Text-to-SQL Oriented to the Process Mining Domain: A PT-EN Dataset for Query Translation
Yamate, Bruno Yui, Neubauer, Thais Rodrigues, Fantinato, Marcelo, Peres, Sarajane Marques
This paper introduces text-2-SQL-4-PM, a bilingual (Portuguese-English) benchmark dataset designed for the text-to-SQL task in the process mining domain. Text-to-SQL conversion facilitates natural language querying of databases, increasing accessibility for users without SQL expertise and productivity for those that are experts. The text-2-SQL-4-PM dataset is customized to address the unique challenges of process mining, including specialized vocabularies and single-table relational structures derived from event logs. The dataset comprises 1,655 natural language utterances, including human-generated paraphrases, 205 SQL statements, and ten qualifiers. Methods include manual curation by experts, professional translations, and a detailed annotation process to enable nuanced analyses of task complexity. Additionally, a baseline study using GPT-3.5 Turbo demonstrates the feasibility and utility of the dataset for text-to-SQL applications. The results show that text-2-SQL-4-PM supports evaluation of text-to-SQL implementations, offering broader applicability for semantic parsing and other natural language processing tasks.
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey (0.04)
- (4 more...)
ODKE+: Ontology-Guided Open-Domain Knowledge Extraction with LLMs
Khorshidi, Samira, Nikfarjam, Azadeh, Shankar, Suprita, Sang, Yisi, Govind, Yash, Jang, Hyun, Kasgari, Ali, McClimans, Alexis, Soliman, Mohamed, Konda, Vishnu, Fakhry, Ahmed, Qi, Xiaoguang
Knowledge graphs (KGs) are foundational to many AI applications, but maintaining their freshness and completeness remains costly. We present ODKE+, a production-grade system that automatically extracts and ingests millions of open-domain facts from web sources with high precision. ODKE+ combines modular components into a scalable pipeline: (1) the Extraction Initiator detects missing or stale facts, (2) the Evidence Retriever collects supporting documents, (3) hybrid Knowledge Extractors apply both pattern-based rules and ontology-guided prompting for large language models (LLMs), (4) a lightweight Grounder validates extracted facts using a second LLM, and (5) the Corroborator ranks and normalizes candidate facts for ingestion. ODKE+ dynamically generates ontology snippets tailored to each entity type to align extractions with schema constraints, enabling scalable, type-consistent fact extraction across 195 predicates. The system supports batch and streaming modes, processing over 9 million Wikipedia pages and ingesting 19 million high-confidence facts with 98.8% precision. ODKE+ significantly improves coverage over traditional methods, achieving up to 48% overlap with third-party KGs and reducing update lag by 50 days on average. Our deployment demonstrates that LLM-based extraction, grounded in ontological structure and verification workflows, can deliver trustworthiness, production-scale knowledge ingestion with broad real-world applicability. A recording of the system demonstration is included with the submission and is also available at https://youtu.be/UcnE3_GsTWs.
- North America > United States > Indiana (0.04)
- Europe > United Kingdom > England (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (2 more...)
Understanding the Embedding Models on Hyper-relational Knowledge Graph
Wang, Yubo, Di, Shimin, Wang, Zhili, Li, Haoyang, Teng, Fei, Xin, Hao, Chen, Lei
Recently, Hyper-relational Knowledge Graphs (HKGs) have been proposed as an extension of traditional Knowledge Graphs (KGs) to better represent real-world facts with additional qualifiers. As a result, researchers have attempted to adapt classical Knowledge Graph Embedding (KGE) models for HKGs by designing extra qualifier processing modules. However, it remains unclear whether the superior performance of Hyper-relational KGE (HKGE) models arises from their base KGE model or the specially designed extension module. Hence, in this paper, we data-wise convert HKGs to KG format using three decomposition methods and then evaluate the performance of several classical KGE models on HKGs. Our results show that some KGE models achieve performance comparable to that of HKGE models. Upon further analysis, we find that the decomposition methods alter the original HKG topology and fail to fully preserve HKG information. Moreover, we observe that current HKGE models are either insufficient in capturing the graph's long-range dependency or struggle to integrate main-triple and qualifier information due to the information compression issue. To further justify our findings and offer a potential direction for future HKGE research, we propose the FormerGNN framework. This framework employs a qualifier integrator to preserve the original HKG topology, and a GNN-based graph encoder to capture the graph's long-range dependencies, followed by an improved approach for integrating main-triple and qualifier information to mitigate compression issues. Our experimental results demonstrate that FormerGNN outperforms existing HKGE models.
- Asia > China > Hong Kong (0.41)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- (2 more...)
VITA: Versatile Time Representation Learning for Temporal Hyper-Relational Knowledge Graphs
Un, ChongIn, Lu, Yuhuan, Yang, Tianyue, Yang, Dingqi
Knowledge graphs (KGs) have become an effective paradigm for managing real-world facts, which are not only complex but also dynamically evolve over time. The temporal validity of facts often serves as a strong clue in downstream link prediction tasks, which predicts a missing element in a fact. Traditional link prediction techniques on temporal KGs either consider a sequence of temporal snapshots of KGs with an ad-hoc defined time interval or expand a temporal fact over its validity period under a predefined time granularity; these approaches not only suffer from the sensitivity of the selection of time interval/granularity, but also face the computational challenges when handling facts with long (even infinite) validity. Although the recent hyper-relational KGs represent the temporal validity of a fact as qualifiers describing the fact, it is still suboptimal due to its ignorance of the infinite validity of some facts and the insufficient information encoded from the qualifiers about the temporal validity. Against this background, we propose VITA, a $\underline{V}$ersatile t$\underline{I}$me represen$\underline{TA}$tion learning method for temporal hyper-relational knowledge graphs. We first propose a versatile time representation that can flexibly accommodate all four types of temporal validity of facts (i.e., since, until, period, time-invariant), and then design VITA to effectively learn the time information in both aspects of time value and timespan to boost the link prediction performance. We conduct a thorough evaluation of VITA compared to a sizable collection of baselines on real-world KG datasets. Results show that VITA outperforms the best-performing baselines in various link prediction tasks (predicting missing entities, relations, time, and other numeric literals) by up to 75.3%. Ablation studies and a case study also support our key design choices.
- Asia > Macao (1.00)
- Asia > China (0.76)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (4 more...)
- Information Technology > Information Management (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.86)
Wiki-Quantities and Wiki-Measurements: Datasets of Quantities and their Measurement Context from Wikipedia
Göpfert, Jan, Kuckertz, Patrick, Weinand, Jann M., Stolten, Detlef
To cope with the large number of publications, more and more researchers are automatically extracting data of interest using natural language processing methods based on supervised learning. Much data, especially in the natural and engineering sciences, is quantitative, but there is a lack of datasets for identifying quantities and their context in text. To address this issue, we present two large datasets based on Wikipedia and Wikidata: Wiki-Quantities is a dataset consisting of over 1.2 million annotated quantities in the English-language Wikipedia. Wiki-Measurements is a dataset of 38 738 annotated quantities in the English-language Wikipedia along with their respective measured entity, property, and optional qualifiers. Manual validation of 100 samples each of Wiki-Quantities and Wiki-Measurements found 100% and 84-94% correct, respectively. The datasets can be used in pipeline approaches to measurement extraction, where quantities are first identified and then their measurement context. To allow reproduction of this work using newer or different versions of Wikipedia and Wikidata, we publish the code used to create the datasets along with the data.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Alabama (0.04)
- (19 more...)
- Energy > Power Industry (0.46)
- Transportation > Air (0.46)
- Media > Television (0.46)
ORIGAMI: A generative transformer architecture for predictions from semi-structured data
Rückstieß, Thomas, Huang, Alana, Vujanic, Robin
Despite the popularity and widespread use of semi-structured data formats such as JSON, end-to-end supervised learning applied directly to such data remains underexplored. We present ORIGAMI (Object RepresentatIon via Generative Autoregressive ModellIng), a transformer-based architecture that directly processes nested key/value pairs while preserving their hierarchical semantics. Our key technical contributions include: (1) a structure-preserving tokenizer, (2) a novel key/value position encoding scheme, and (3) a grammar-constrained training and inference framework that ensures valid outputs and accelerates training convergence. These enhancements enable efficient end-to-end modeling of semi-structured data. By reformulating classification as next-token prediction, ORIGAMI naturally handles both single-label and multi-label tasks without architectural modifications. Empirical evaluation across diverse domains demonstrates ORIGAMI's effectiveness: On standard tabular benchmarks converted to JSON, ORIGAMI remains competitive with classical and state-of-the-art approaches. On native JSON datasets, we outperform baselines on multi-label classification and specialized models such as convolutional and graph neural networks on a code classification task. Through extensive ablation studies, we validate the impact of each architectural component and establish ORIGAMI as a robust framework for end-to-end learning on semi-structured data.
- North America > United States (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- Asia > China > Beijing > Beijing (0.04)
Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata using LLMs
Mihindukulasooriya, Nandana, Tiwari, Sanju, Dobriy, Daniil, Nielsen, Finn Årup, Chhetri, Tek Raj, Polleres, Axel
Several initiatives have been undertaken to conceptually model the domain of scholarly data using ontologies and to create respective Knowledge Graphs. Yet, the full potential seems unleashed, as automated means for automatic population of said ontologies are lacking, and respective initiatives from the Semantic Web community are not necessarily connected: we propose to make scholarly data more sustainably accessible by leveraging Wikidata's infrastructure and automating its population in a sustainable manner through LLMs by tapping into unstructured sources like conference Web sites and proceedings texts as well as already existing structured conference datasets. While an initial analysis shows that Semantic Web conferences are only minimally represented in Wikidata, we argue that our methodology can help to populate, evolve and maintain scholarly data as a community within Wikidata. Our main contributions include (a) an analysis of ontologies for representing scholarly data to identify gaps and relevant entities/properties in Wikidata, (b) semi-automated extraction -- requiring (minimal) manual validation -- of conference metadata (e.g., acceptance rates, organizer roles, programme committee members, best paper awards, keynotes, and sponsors) from websites and proceedings texts using LLMs. Finally, we discuss (c) extensions to visualization tools in the Wikidata context for data exploration of the generated scholarly data. Our study focuses on data from 105 Semantic Web-related conferences and extends/adds more than 6000 entities in Wikidata. It is important to note that the method can be more generally applicable beyond Semantic Web-related conferences for enhancing Wikidata's utility as a comprehensive scholarly resource. Source Repository: https://github.com/scholarly-wikidata/ DOI: https://doi.org/10.5281/zenodo.10989709 License: Creative Commons CC0 (Data), MIT (Code)
- Europe > Austria > Vienna (0.14)
- Asia > Nepal (0.04)
- North America > United States > New York (0.04)
- (8 more...)
- Research Report (0.82)
- Instructional Material > Course Syllabus & Notes (0.46)
- Personal > Honors (0.34)
Process Trace Querying using Knowledge Graphs and Notation3
In process mining, a log exploration step allows making sense of the event traces; e.g., identifying event patterns and illogical traces, and gaining insight into their variability. To support expressive log exploration, the event log can be converted into a Knowledge Graph (KG), which can then be queried using general-purpose languages. We explore the creation of semantic KG using the Resource Description Framework (RDF) as a data model, combined with the general-purpose Notation3 (N3) rule language for querying. We show how typical trace querying constraints, inspired by the state of the art, can be implemented in N3. We convert case- and object-centric event logs into a trace-based semantic KG; OCEL2 logs are hereby "flattened" into traces based on object paths through the KG. This solution offers (a) expressivity, as queries can instantiate constraints in multiple ways and arbitrarily constrain attributes and relations (e.g., actors, resources); (b) flexibility, as OCEL2 event logs can be serialized as traces in arbitrary ways based on the KG; and (c) extensibility, as others can extend our library by leveraging the same implementation patterns.
- North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
- Europe > Switzerland (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.47)
Inferring Pluggable Types with Machine Learning
Siddiqui, Kazi Amanul Islam, Kellogg, Martin
Pluggable type systems allow programmers to extend the type system of a programming language to enforce semantic properties defined by the programmer. Pluggable type systems are difficult to deploy in legacy codebases because they require programmers to write type annotations manually. This paper investigates how to use machine learning to infer type qualifiers automatically. We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers. We evaluate several model architectures for inferring type qualifiers, including Graph Transformer Network, Graph Convolutional Network and Large Language Model. We further validated these models by applying them to 12 open-source programs from a prior evaluation of the NullAway pluggable typechecker, lowering warnings in all but one unannotated project. We discovered that GTN shows the best performance, with a recall of .89 and precision of 0.6. Furthermore, we conduct a study to estimate the number of Java classes needed for good performance of the trained model. For our feasibility study, performance improved around 16k classes, and deteriorated due to overfitting around 22k classes.
- North America > United States > New Jersey (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (10 more...)
- Information Technology > Software > Programming Languages (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.67)