AITopics | qualifier

Collaborating Authors

qualifier

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Seven games, 20 goals, none conceded - England close in on perfection

BBC NewsNov-14-2025, 16:49:50 GMT

England have cruised through World Cup qualifying, winning all seven of their games and scoring 20 unanswered goals, setting several records and leaving them on the cusp of another. If Thomas Tuchel's team beat Albania in Sunday's final qualifier (17:00 GMT) and keep a clean sheet, they will become the first European side to play at least six qualifiers and win them all without conceding. A clean sweep of victories - regardless of goals conceded - is also a rare achievement. Excluding the early years of the World Cup, when teams often played just a handful of preliminary matches, only four European countries have finished with a 100% winning record. Germany were the last side to do so on the way to the 2018 tournament, though they went on to suffer a shock early exit in Russia.

artificial intelligence, england, england close, (15 more...)

BBC News

Country: Europe > United Kingdom > England (0.75)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology: Information Technology > Artificial Intelligence (0.36)

Add feedback

Text-to-SQL Oriented to the Process Mining Domain: A PT-EN Dataset for Query Translation

Yamate, Bruno Yui, Neubauer, Thais Rodrigues, Fantinato, Marcelo, Peres, Sarajane Marques

arXiv.org Artificial IntelligenceSep-15-2025

This paper introduces text-2-SQL-4-PM, a bilingual (Portuguese-English) benchmark dataset designed for the text-to-SQL task in the process mining domain. Text-to-SQL conversion facilitates natural language querying of databases, increasing accessibility for users without SQL expertise and productivity for those that are experts. The text-2-SQL-4-PM dataset is customized to address the unique challenges of process mining, including specialized vocabularies and single-table relational structures derived from event logs. The dataset comprises 1,655 natural language utterances, including human-generated paraphrases, 205 SQL statements, and ten qualifiers. Methods include manual curation by experts, professional translations, and a detailed annotation process to enable nuanced analyses of task complexity. Additionally, a baseline study using GPT-3.5 Turbo demonstrates the feasibility and utility of the dataset for text-to-SQL applications. The results show that text-2-SQL-4-PM supports evaluation of text-to-SQL implementations, offering broader applicability for semantic parsing and other natural language processing tasks.

large language model, machine learning, utterance, (20 more...)

arXiv.org Artificial Intelligence

2509.09684

Country:

North America > United States (0.67)
Europe (0.67)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ODKE+: Ontology-Guided Open-Domain Knowledge Extraction with LLMs

Khorshidi, Samira, Nikfarjam, Azadeh, Shankar, Suprita, Sang, Yisi, Govind, Yash, Jang, Hyun, Kasgari, Ali, McClimans, Alexis, Soliman, Mohamed, Konda, Vishnu, Fakhry, Ahmed, Qi, Xiaoguang

arXiv.org Artificial IntelligenceSep-8-2025

Knowledge graphs (KGs) are foundational to many AI applications, but maintaining their freshness and completeness remains costly. We present ODKE+, a production-grade system that automatically extracts and ingests millions of open-domain facts from web sources with high precision. ODKE+ combines modular components into a scalable pipeline: (1) the Extraction Initiator detects missing or stale facts, (2) the Evidence Retriever collects supporting documents, (3) hybrid Knowledge Extractors apply both pattern-based rules and ontology-guided prompting for large language models (LLMs), (4) a lightweight Grounder validates extracted facts using a second LLM, and (5) the Corroborator ranks and normalizes candidate facts for ingestion. ODKE+ dynamically generates ontology snippets tailored to each entity type to align extractions with schema constraints, enabling scalable, type-consistent fact extraction across 195 predicates. The system supports batch and streaming modes, processing over 9 million Wikipedia pages and ingesting 19 million high-confidence facts with 98.8% precision. ODKE+ significantly improves coverage over traditional methods, achieving up to 48% overlap with third-party KGs and reducing update lag by 50 days on average. Our deployment demonstrates that LLM-based extraction, grounded in ontological structure and verification workflows, can deliver trustworthiness, production-scale knowledge ingestion with broad real-world applicability. A recording of the system demonstration is included with the submission and is also available at https://youtu.be/UcnE3_GsTWs.

extraction, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.04696

Country: North America > United States (0.47)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Understanding the Embedding Models on Hyper-relational Knowledge Graph

Wang, Yubo, Di, Shimin, Wang, Zhili, Li, Haoyang, Teng, Fei, Xin, Hao, Chen, Lei

arXiv.org Artificial IntelligenceAug-6-2025

Recently, Hyper-relational Knowledge Graphs (HKGs) have been proposed as an extension of traditional Knowledge Graphs (KGs) to better represent real-world facts with additional qualifiers. As a result, researchers have attempted to adapt classical Knowledge Graph Embedding (KGE) models for HKGs by designing extra qualifier processing modules. However, it remains unclear whether the superior performance of Hyper-relational KGE (HKGE) models arises from their base KGE model or the specially designed extension module. Hence, in this paper, we data-wise convert HKGs to KG format using three decomposition methods and then evaluate the performance of several classical KGE models on HKGs. Our results show that some KGE models achieve performance comparable to that of HKGE models. Upon further analysis, we find that the decomposition methods alter the original HKG topology and fail to fully preserve HKG information. Moreover, we observe that current HKGE models are either insufficient in capturing the graph's long-range dependency or struggle to integrate main-triple and qualifier information due to the information compression issue. To further justify our findings and offer a potential direction for future HKGE research, we propose the FormerGNN framework. This framework employs a qualifier integrator to preserve the original HKG topology, and a GNN-based graph encoder to capture the graph's long-range dependencies, followed by an improved approach for integrating main-triple and qualifier information to mitigate compression issues. Our experimental results demonstrate that FormerGNN outperforms existing HKGE models.

artificial intelligence, information, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.0328

Country: Asia > China (0.47)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

VITA: Versatile Time Representation Learning for Temporal Hyper-Relational Knowledge Graphs

Un, ChongIn, Lu, Yuhuan, Yang, Tianyue, Yang, Dingqi

arXiv.org Artificial IntelligenceMay-20-2025

Knowledge graphs (KGs) have become an effective paradigm for managing real-world facts, which are not only complex but also dynamically evolve over time. The temporal validity of facts often serves as a strong clue in downstream link prediction tasks, which predicts a missing element in a fact. Traditional link prediction techniques on temporal KGs either consider a sequence of temporal snapshots of KGs with an ad-hoc defined time interval or expand a temporal fact over its validity period under a predefined time granularity; these approaches not only suffer from the sensitivity of the selection of time interval/granularity, but also face the computational challenges when handling facts with long (even infinite) validity. Although the recent hyper-relational KGs represent the temporal validity of a fact as qualifiers describing the fact, it is still suboptimal due to its ignorance of the infinite validity of some facts and the insufficient information encoded from the qualifiers about the temporal validity. Against this background, we propose VITA, a $\underline{V}$ersatile t$\underline{I}$me represen$\underline{TA}$tion learning method for temporal hyper-relational knowledge graphs. We first propose a versatile time representation that can flexibly accommodate all four types of temporal validity of facts (i.e., since, until, period, time-invariant), and then design VITA to effectively learn the time information in both aspects of time value and timespan to boost the link prediction performance. We conduct a thorough evaluation of VITA compared to a sizable collection of baselines on real-world KG datasets. Results show that VITA outperforms the best-performing baselines in various link prediction tasks (predicting missing entities, relations, time, and other numeric literals) by up to 75.3%. Ablation studies and a case study also support our key design choices.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2505.11803

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.86)

Add feedback

Wiki-Quantities and Wiki-Measurements: Datasets of Quantities and their Measurement Context from Wikipedia

Göpfert, Jan, Kuckertz, Patrick, Weinand, Jann M., Stolten, Detlef

arXiv.org Artificial IntelligenceMar-18-2025

To cope with the large number of publications, more and more researchers are automatically extracting data of interest using natural language processing methods based on supervised learning. Much data, especially in the natural and engineering sciences, is quantitative, but there is a lack of datasets for identifying quantities and their context in text. To address this issue, we present two large datasets based on Wikipedia and Wikidata: Wiki-Quantities is a dataset consisting of over 1.2 million annotated quantities in the English-language Wikipedia. Wiki-Measurements is a dataset of 38 738 annotated quantities in the English-language Wikipedia along with their respective measured entity, property, and optional qualifiers. Manual validation of 100 samples each of Wiki-Quantities and Wiki-Measurements found 100% and 84-94% correct, respectively. The datasets can be used in pipeline approaches to measurement extraction, where quantities are first identified and then their measurement context. To allow reproduction of this work using newer or different versions of Wikipedia and Wikidata, we publish the code used to create the datasets along with the data.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.1409

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Alabama (0.04)
(19 more...)

Genre: Research Report (0.40)

Industry:

Energy > Power Industry (0.46)
Transportation > Air (0.46)
Media > Television (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Collaboration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

ORIGAMI: A generative transformer architecture for predictions from semi-structured data

Rückstieß, Thomas, Huang, Alana, Vujanic, Robin

arXiv.org Artificial IntelligenceDec-23-2024

Despite the popularity and widespread use of semi-structured data formats such as JSON, end-to-end supervised learning applied directly to such data remains underexplored. We present ORIGAMI (Object RepresentatIon via Generative Autoregressive ModellIng), a transformer-based architecture that directly processes nested key/value pairs while preserving their hierarchical semantics. Our key technical contributions include: (1) a structure-preserving tokenizer, (2) a novel key/value position encoding scheme, and (3) a grammar-constrained training and inference framework that ensures valid outputs and accelerates training convergence. These enhancements enable efficient end-to-end modeling of semi-structured data. By reformulating classification as next-token prediction, ORIGAMI naturally handles both single-label and multi-label tasks without architectural modifications. Empirical evaluation across diverse domains demonstrates ORIGAMI's effectiveness: On standard tabular benchmarks converted to JSON, ORIGAMI remains competitive with classical and state-of-the-art approaches. On native JSON datasets, we outperform baselines on multi-label classification and specialized models such as convolutional and graph neural networks on a code classification task. Through extensive ablation studies, we validate the impact of each architectural component and establish ORIGAMI as a robust framework for end-to-end learning on semi-structured data.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.17348

Country: Oceania > Australia (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata using LLMs

Mihindukulasooriya, Nandana, Tiwari, Sanju, Dobriy, Daniil, Nielsen, Finn Årup, Chhetri, Tek Raj, Polleres, Axel

arXiv.org Artificial IntelligenceNov-13-2024

Several initiatives have been undertaken to conceptually model the domain of scholarly data using ontologies and to create respective Knowledge Graphs. Yet, the full potential seems unleashed, as automated means for automatic population of said ontologies are lacking, and respective initiatives from the Semantic Web community are not necessarily connected: we propose to make scholarly data more sustainably accessible by leveraging Wikidata's infrastructure and automating its population in a sustainable manner through LLMs by tapping into unstructured sources like conference Web sites and proceedings texts as well as already existing structured conference datasets. While an initial analysis shows that Semantic Web conferences are only minimally represented in Wikidata, we argue that our methodology can help to populate, evolve and maintain scholarly data as a community within Wikidata. Our main contributions include (a) an analysis of ontologies for representing scholarly data to identify gaps and relevant entities/properties in Wikidata, (b) semi-automated extraction -- requiring (minimal) manual validation -- of conference metadata (e.g., acceptance rates, organizer roles, programme committee members, best paper awards, keynotes, and sponsors) from websites and proceedings texts using LLMs. Finally, we discuss (c) extensions to visualization tools in the Wikidata context for data exploration of the generated scholarly data. Our study focuses on data from 105 Semantic Web-related conferences and extends/adds more than 6000 entities in Wikidata. It is important to note that the method can be more generally applicable beyond Semantic Web-related conferences for enhancing Wikidata's utility as a comprehensive scholarly resource. Source Repository: https://github.com/scholarly-wikidata/ DOI: https://doi.org/10.5281/zenodo.10989709 License: Creative Commons CC0 (Data), MIT (Code)

information, ontology, wikidata, (16 more...)

arXiv.org Artificial Intelligence

2411.08696

Country:

Europe > Austria > Vienna (0.14)
Asia > Nepal (0.04)
North America > United States > New York (0.04)
(8 more...)

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.46)
Personal > Honors (0.34)

Industry: Information Technology (0.34)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.92)

Add feedback

Process Trace Querying using Knowledge Graphs and Notation3

Van Woensel, William

arXiv.org Artificial IntelligenceAug-26-2024

In process mining, a log exploration step allows making sense of the event traces; e.g., identifying event patterns and illogical traces, and gaining insight into their variability. To support expressive log exploration, the event log can be converted into a Knowledge Graph (KG), which can then be queried using general-purpose languages. We explore the creation of semantic KG using the Resource Description Framework (RDF) as a data model, combined with the general-purpose Notation3 (N3) rule language for querying. We show how typical trace querying constraints, inspired by the state of the art, can be implemented in N3. We convert case- and object-centric event logs into a trace-based semantic KG; OCEL2 logs are hereby "flattened" into traces based on object paths through the KG. This solution offers (a) expressivity, as queries can instantiate constraints in multiple ways and arbitrarily constrain attributes and relations (e.g., actors, resources); (b) flexibility, as OCEL2 event logs can be serialized as traces in arbitrary ways based on the KG; and (c) extensibility, as others can extend our library by leveraging the same implementation patterns.

constraint, event log, relation, (14 more...)

arXiv.org Artificial Intelligence

2409.04452

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.47)

Add feedback

Inferring Pluggable Types with Machine Learning

Siddiqui, Kazi Amanul Islam, Kellogg, Martin

arXiv.org Artificial IntelligenceJun-21-2024

Pluggable type systems allow programmers to extend the type system of a programming language to enforce semantic properties defined by the programmer. Pluggable type systems are difficult to deploy in legacy codebases because they require programmers to write type annotations manually. This paper investigates how to use machine learning to infer type qualifiers automatically. We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers. We evaluate several model architectures for inferring type qualifiers, including Graph Transformer Network, Graph Convolutional Network and Large Language Model. We further validated these models by applying them to 12 open-source programs from a prior evaluation of the NullAway pluggable typechecker, lowering warnings in all but one unannotated project. We discovered that GTN shows the best performance, with a recall of .89 and precision of 0.6. Furthermore, we conduct a study to estimate the number of Java classes needed for good performance of the trained model. For our feasibility study, performance improved around 16k classes, and deteriorated due to overfitting around 22k classes.

annotation, node, type system, (15 more...)

arXiv.org Artificial Intelligence

2406.15676

Country:

North America > United States > New Jersey (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(10 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.67)

Add feedback