Sub-Merge: Diving Down to the Attribute-Value Level in Statistical Schema Matching

Lim, Zhe (The University of Melbourne) | Rubinstein, Benjamin (The University of Melbourne)

Mar-6-2015–AAAI Conferences

Matching and merging data from conflicting sources is the bread and butter of data integration, which drives search verticals, e-commerce comparison sites and cyber intelligence. Schema matching lifts data integration - traditionally focused on well-structured data - to highly heterogeneous sources. While schema matching has enjoyed significant success in matching data attributes, inconsistencies can exist at a deeper level, making full integration difficult or impossible. We propose a more fine-grained approach that focuses on correspondences between the values of attributes across data sources. Since the semantics of attribute values derive from their use and co-occurrence, we argue for the suitability of canonical correlation analysis (CCA) and its variants. We demonstrate the superior statistical and computational performance of multiple sparse CCA compared to a suite of baseline algorithms, on two datasets which we are releasing to stimulate further research. Our crowd-annotated data covers both cases that are relatively easy for humans to supply ground-truth, and that are inherently difficult for human computation.

artificial intelligence, machine learning, natural language, (21 more...)

AAAI Conferences

Mar-6-2015

Conferences PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America > United States
  - California (0.04)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Industry:
- Leisure & Entertainment (1.00)
- Media > Film (0.94)
- Information Technology > Services
  - e-Commerce Services (0.34)

Technology:
- Information Technology
  - Information Management (1.00)
  - Data Science (1.00)
  - Communications > Social Media
    - Crowdsourcing (0.68)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning (1.00)
    - Natural Language > Text Processing (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found