Wikidata and Wikipedia have been proven useful for reason-ing in natural language applications, like question answering or entitylinking. Yet, no existing work has studied the potential of Wikidata for commonsense reasoning. This paper investigates whether Wikidata con-tains commonsense knowledge which is complementary to existing commonsense sources. Starting from a definition of common sense, we devise three guiding principles, and apply them to generate a commonsense subgraph of Wikidata (Wikidata-CS). Within our approach, we map the relations of Wikidata to ConceptNet, which we also leverage to integrate Wikidata-CS into an existing consolidated commonsense graph. Our experiments reveal that: 1) albeit Wikidata-CS represents a small portion of Wikidata, it is an indicator that Wikidata contains relevant commonsense knowledge, which can be mapped to 15 ConceptNet relations; 2) the overlap between Wikidata-CS and other commonsense sources is low, motivating the value of knowledge integration; 3) Wikidata-CS has been evolving over time at a slightly slower rate compared to the overall Wikidata, indicating a possible lack of focus on commonsense knowledge. Based on these findings, we propose three recommended actions to improve the coverage and quality of Wikidata-CS further.
Wikidata constraints, albeit useful, are represented and processed in an incomplete, ad hoc fashion. Constraint declarations do not fully express their meaning, and thus do not provide a precise, unambiguous basis for constraint specification, or a logical foundation for constraint-checking implementations. In prior work we have proposed a logical framework for Wikidata as a whole, based on multi-attributed relational structures (MARS) and related logical languages. In this paper we explain how constraints are handled in the proposed framework, and show that nearly all of Wikidata's existing property constraints can be completely characterized in it, in a natural and economical fashion. We also give characterizations for several proposed property constraints, and show that a variety of non-property constraints can be handled in the same framework.
Knowledge graphs (KGs) have become the preferred technology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is heterogeneous, difficult to integrate and only covers a subset of the operations that are commonly needed in data science applications. In this paper, we present KGTK, a data science-centric toolkit to represent, create, transform, enhance and analyze KGs. KGTK represents graphs in tables and leverages popular libraries developed for data science applications, enabling a wide audience of developers to easily construct knowledge graph pipelines for their applications. We illustrate KGTK with real-world scenarios in which we have used KGTK to integrate and manipulate large KGs, such as Wikidata, DBpedia and ConceptNet, in our own work.
Multi-attributed relational structures (MARSs) have been proposed as a formal data model for generalized property graphs, along with multi-attributed rule-based predicate logic (MARPL) as a useful rule-based logic in which to write inference rules over property graphs. Wikidata can be modelled in an extended MARS that adds the (imprecise) datatypes of Wikidata. The rules of inference for the Wikidata ontology can be modelled as a MARPL ontology, with extensions to handle the Wikidata datatypes and functions over these datatypes. Because many Wikidata qualifiers should participate in most inference rules in Wikidata a method of implicitly handling qualifier values on a per-qualifier basis is needed to make this modelling useful. The meaning of Wikidata is then the extended MARS that is the closure of running these rules on the Wikidata data model. Wikidata constraints can be modelled as multi-attributed predicate logic (MAPL) formulae, again extended with datatypes, that are evaluated over this extended MARS.
Wikidata is the largest general-interest knowledge base that is openly available. It is collaboratively edited by thousands of volunteer editors and has thus evolved considerably since its inception in 2012. In this paper, we present Wikidated 1.0, a dataset of Wikidata's full revision history, which encodes changes between Wikidata revisions as sets of deletions and additions of RDF triples. To the best of our knowledge, it constitutes the first large dataset of an evolving knowledge graph, a recently emerging research subject in the Semantic Web community. We introduce the methodology for generating Wikidated 1.0 from dumps of Wikidata, discuss its implementation and limitations, and present statistical characteristics of the dataset.