Many autonomous and heterogeneous information sources are becoming increasingly available to users through the Internet, especially through the World Wide Web. In order to make the information available in a consolidated, uniform, and efficient manner, it is necessary to integrate the different information sources. The integration of Internet sources poses several challenges that have not been sufficiently addressed by work on the integration of corporate databases residing on an Intranet [LMR90]. We believe that the most important ones are heterogeneity, large number of sources, redundancy, availability, source autonomy, and diverse access methods and querying interfaces.
Logic-based systems for query answering over knowledge graphs return only answers that rely on information explicitly represented in the graph. To improve recall, recent works have proposed the use of embeddings to predict additional information like missing links, or labels. These embeddings enable scoring entities in the graph as the answer a query, without being fully dependent on the graph structure. In its simplest case, answering a query in such a setting requires predicting a link between two entities. However, link prediction is not sufficient to address complex queries that involve multiple entities and variables. To solve this task, we propose to apply a message passing mechanism to a graph representation of the query, where nodes correspond to variables and entities. This results in an embedding of the query, such that answering entities are close to it in the embedding space. The general formulation of our method allows it to encode a more diverse set of query types in comparison to previous work. We evaluate our method by answering queries that rely on edges not seen during training, obtaining competitive performance. In contrast with previous work, we show that our method can generalize from training for the single-hop, link prediction task, to answering queries with more complex structures. A qualitative analysis reveals that the learned embeddings successfully capture the notion of different entity types.
In this paper, we present an approach to mine large-scale knowledge graphs to discover inference paths for query expansion in NLIDB (Natural Language Interface to Databases). Addressing this problem is important in order for NLIDB applications to effectively handle relevant concepts in the domain of interest that do not correspond to any structured fields in the target database. We also present preliminary observations on the performance of our approach applied to Freebase, and conclude with discussions on next steps to further evaluate and extend our approach.
Traditional information retrieval systems represent documents and queries by keyword sets. However, the content of a document or a query is mainly defined by both keywords and named entities occurring in it. Named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. Besides, the meaning of a query may imply latent named entities that are related to the apparent ones in the query. We propose an ontology-based generalized vector space model to semantic text search. It exploits ontological features of named entities and their latently related ones to reveal the semantics of documents and queries. We also propose a framework to combine different ontologies to take their complementary advantages for semantic annotation and searching. Experiments on a benchmark dataset show better search quality of our model to other ones.
In this paper, we present an embedding-based framework (TrQuery) for recommending solutions of a SPARQL query, including approximate solutions when exact querying solutions are not available due to incompleteness or inconsistencies of real-world RDF data. Within this framework, embedding is applied to score solutions together with edit distance so that we could obtain more fine-grained recommendations than those recommendations via edit distance. For instance, graphs of two querying solutions with a similar structure can be distinguished in our proposed framework while the edit distance depending on structural difference becomes unable. To this end, we propose a novel score model built on vector space generated in embedding system to compute the similarity between an approximate subgraph matching and a whole graph matching. Finally, we evaluate our approach on large RDF datasets DBpedia and YAGO, and experimental results show that TrQuery exhibits an excellent behavior in terms of both effectiveness and efficiency.