AITopics | joinability

Collaborating Authors

joinability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Snoopy: Effective and Efficient Semantic Join Discovery via Proxy Columns

Guo, Yuxiang, Mao, Yuren, Hu, Zhonghao, Chen, Lu, Gao, Yunjun

arXiv.org Artificial IntelligenceFeb-23-2025

Semantic join discovery, which aims to find columns in a table repository with high semantic joinabilities to a query column, is crucial for dataset discovery. Existing methods can be divided into two categories: cell-level methods and column-level methods. However, neither of them ensures both effectiveness and efficiency simultaneously. Cell-level methods, which compute the joinability by counting cell matches between columns, enjoy ideal effectiveness but suffer poor efficiency. In contrast, column-level methods, which determine joinability only by computing the similarity of column embeddings, enjoy proper efficiency but suffer poor effectiveness due to the issues occurring in their column embeddings: (i) semantics-joinability-gap, (ii) size limit, and (iii) permutation sensitivity. To address these issues, this paper proposes to compute column embeddings via proxy columns; furthermore, a novel column-level semantic join discovery framework, Snoopy, is presented, leveraging proxy-column-based embeddings to bridge effectiveness and efficiency. Specifically, the proposed column embeddings are derived from the implicit column-to-proxy-column relationships, which are captured by the lightweight approximate-graph-matching-based column projection.To acquire good proxy columns for guiding the column projection, we introduce a rank-aware contrastive learning paradigm. Extensive experiments on four real-world datasets demonstrate that Snoopy outperforms SOTA column-level methods by 16% in Recall@25 and 10% in NDCG@25, and achieves superior efficiency--being at least 5 orders of magnitude faster than cell-level solutions, and 3.5x faster than existing column-level methods.

joinability, proxy column, snoopy, (14 more...)

arXiv.org Artificial Intelligence

2502.16813

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

DeepJoin: Joinable Table Discovery with Pre-trained Language Models

Dong, Yuyang, Xiao, Chuan, Nozawa, Takuma, Enomoto, Masafumi, Oyamada, Masafumi

arXiv.org Artificial IntelligenceJun-23-2023

Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery has become an important operation in data lake management. Existing approaches target equi-joins, the most common way of combining tables for creating a unified view, or semantic joins, which tolerate misspellings and different formats to deliver more join results. They are either exact solutions whose running time is linear in the sizes of query column and target table repository or approximate solutions lacking precision. In this paper, we propose Deepjoin, a deep learning model for accurate and efficient joinable table discovery. Our solution is an embedding-based retrieval, which employs a pre-trained language model (PLM) and is designed as one framework serving both equi- and semantic joins. We propose a set of contextualization options to transform column contents to a text sequence. The PLM reads the sequence and is fine-tuned to embed columns to vectors such that columns are expected to be joinable if they are close to each other in the vector space. Since the output of the PLM is fixed in length, the subsequent search procedure becomes independent of the column size. With a state-of-the-art approximate nearest neighbor search algorithm, the search time is logarithmic in the repository size. To train the model, we devise the techniques for preparing training data as well as data augmentation. The experiments on real datasets demonstrate that by training on a small subset of a corpus, Deepjoin generalizes to large datasets and its precision consistently outperforms other approximate solutions'. Deepjoin is even more accurate than an exact solution to semantic joins when evaluated with labels from experts. Moreover, when equipped with a GPU, Deepjoin is up to two orders of magnitude faster than existing solutions.

deepjoin, information retrieval, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2212.07588

Country:

North America > United States > California (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Oceania > Australia (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Syntactic Confluence Criteria for Positive/Negative-Conditional Term Rewriting Systems

Wirth, Claus-Peter

arXiv.org Artificial IntelligenceFeb-20-2009

We study the combination of the following already known ideas for showing confluence of unconditional or conditional term rewriting systems into practically more useful confluence criteria for conditional systems: Our syntactical separation into constructor and non-constructor symbols, Huet's introduction and Toyama's generalization of parallel closedness for non-noetherian unconditional systems, the use of shallow confluence for proving confluence of noetherian and non-noetherian conditional systems, the idea that certain kinds of limited confluence can be assumed for checking the fulfilledness or infeasibility of the conditions of conditional critical pairs, and the idea that (when termination is given) only prime superpositions have to be considered and certain normalization restrictions can be applied for the substitutions fulfilling the conditions of conditional critical pairs. Besides combining and improving already known methods, we present the following new ideas and results: We strengthen the criterion for overlay joinable noetherian systems, and, by using the expressiveness of our syntactical separation into constructor and non-constructor symbols, we are able to present criteria for level confluence that are not criteria for shallow confluence actually and also able to weaken the severe requirement of normality (stiffened with left-linearity) in the criteria for shallow confluence of noetherian and non-noetherian conditional systems to the easily satisfied requirement of quasi-normality. Finally, the whole paper may also give a practically useful overview of the syntactical means for showing confluence of conditional term rewriting systems.

artificial intelligence, confluent, logic & formal reasoning, (17 more...)

arXiv.org Artificial Intelligence

0902.3614

Country: Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.24)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.45)

Add feedback