Graph integration of structured, semistructured and unstructured data for data journalism

Anadiotis, Angelos-Christos, Balalau, Oana, Conceicao, Catarina, Galhardas, Helena, Haddad, Mhd Yamen, Manolescu, Ioana, Merabti, Tayeb, You, Jingmao

arXiv.org Artificial Intelligence 

Such a query can be answered currently at a high human effort cost, by inspecting e.g., a JSON list of Assemblée elected officials (available from NosDeputes.fr) and manually connecting the names with those found in a national registry of companies. This considerable effort may still miss connections that could be found if one added information about politicians' and business people's spouses, information sometimes available in public knowledge bases such as DBPedia, or journalists' notes. No single query language can be used on such heterogeneous data; instead, we study methods to query the corpus by specifying some keywords and asking for all the connections that exist, in one or across several data sources, between these keywords. This problem has been studied under the name of keyword search over structured data, in particular for relational databases [49, 27], XML documents [24, 33], RDF graphs [30, 16]. However, most of these works assumed one single source of data, in which connections among nodes are clearly identified. When authors considered several data sources [31], they still assumed that one query answer comes from a single data source. In contrast, the ConnectionLens system [10] answers keyword search queries over arbitrary combinations of datasets and heterogeneous data models, independently produced by actors unaware of each other's existence.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found