AITopics | tabert

Collaborating Authors

tabert

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hy pergraph-enhanced Tabular Data Representation Learning Pei Chen

Neural Information Processing SystemsFeb-12-2026, 22:32:09 GMT

Tabular data that is organized in bi-dimensional matrices are widespread in webpages, documents, and databases.

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
(5 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Robust Detection of Synthetic Tabular Data under Schema Variability

Kindji, G. Charbel N., Fromont, Elisa, Rojas-Barahona, Lina Maria, Urvoy, Tanguy

arXiv.org Artificial IntelligenceDec-2-2025

The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, despite its ubiquity, has been largely overlooked. Yet, detecting synthetic tabular data is especially challenging due to its heterogeneous structure and unseen formats at test time. We address the underexplored task of detecting synthetic tabular data ''in the wild'', i.e. when the detector is deployed on tables with variable and previously unseen schemas. We introduce a novel datum-wise transformer architecture that significantly outperforms the only previously published baseline, improving both AUC and accuracy by 7 points. By incorporating a table-adaptation component, our model gains an additional 7 accuracy points, demonstrating enhanced robustness. This work provides the first strong evidence that detecting synthetic tabular data in real-world conditions is feasible, and demonstrates substantial improvements over previous approaches. Following acceptance of the paper, we are finalizing the administrative and licensing procedures necessary for releasing the source code. This extended version will be updated as soon as the release is complete.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2509.00092

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

66178beae8f12fcd48699de95acc1152-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 23:19:52 GMT

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Spain > Galicia > Madrid (0.05)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
(8 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Datum-wise Transformer for Synthetic Tabular Data Detection in the Wild

Kindji, G. Charbel N., Fromont, Elisa, Rojas-Barahona, Lina Maria, Urvoy, Tanguy

arXiv.org Artificial IntelligenceApr-15-2025

The growing power of generative models raises major concerns about the authenticity of published content. To address this problem, several synthetic content detection methods have been proposed for uniformly structured media such as image or text. However, little work has been done on the detection of synthetic tabular data, despite its importance in industry and government. This form of data is complex to handle due to the diversity of its structures: the number and types of the columns may vary wildly from one table to another. We tackle the tough problem of detecting synthetic tabular data ''in the wild'', i.e. when the model is deployed on table structures it has never seen before. We introduce a novel datum-wise transformer architecture and show that it outperforms existing models. Furthermore, we investigate the application of domain adaptation techniques to enhance the effectiveness of our model, thereby providing a more robust data-forgery detection solution.

large language model, machine learning, tabular data, (16 more...)

arXiv.org Artificial Intelligence

2504.08829

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HYTREL: Hypergraph-enhanced Tabular Data Representation Learning

Chen, Pei, Sarkar, Soumajyoti, Lausen, Leonard, Srinivasan, Balasubramaniam, Zha, Sheng, Huang, Ruihong, Karypis, George

arXiv.org Artificial IntelligenceOct-26-2023

Language models pretrained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariances and three more structural properties of tabular data by using hypergraphs - where the table cells make up the nodes and the cells occurring jointly together in each row, column, and the entire table are used to form three different types of hyperedges. We show that HYTREL is maximally invariant under certain conditions for tabular data, i.e., two tables obtain the same representations via HYTREL iff the two tables are identical up to permutations. Our empirical results demonstrate that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining, illustrating the advantages of incorporating the inductive biases associated with tabular data into the representations. Finally, our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.

hypergraph, permutation, representation, (14 more...)

arXiv.org Artificial Intelligence

2307.08623

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Spain > Galicia > Madrid (0.05)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
(8 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LakeBench: Benchmarks for Data Discovery over Data Lakes

Srinivas, Kavitha, Dolby, Julian, Abdelaziz, Ibrahim, Hassanzadeh, Oktie, Kokel, Harsha, Khatiwada, Aamod, Pedapati, Tejaswini, Chaudhury, Subhajit, Samulowitz, Horst

arXiv.org Artificial IntelligenceJul-9-2023

Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private datasets. In LakeBench, we develop multiple benchmarks for these tasks by using the tables that are drawn from a diverse set of data sources such as government data from CKAN, Socrata, and the European Central Bank. We compare the performance of 4 publicly available tabular foundational models on these tasks. None of the existing models had been trained on the data discovery tasks that we developed for this benchmark; not surprisingly, their performance shows significant room for improvement. The results suggest that the establishment of such benchmarks may be useful to the community to build tabular models usable for data discovery in data lakes.

benchmark, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2307.04217

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Slovakia (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
(11 more...)

Genre: Research Report (0.84)

Industry:

Government > Regional Government > Europe Government (0.34)
Banking & Finance > Economy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.83)

Add feedback

Sattiy at SemEval-2021 Task 9: An Ensemble Solution for Statement Verification and Evidence Finding with Tables

Ruan, Xiaoyi, Jin, Meizhi, Ma, Jian, Yang, Haiqin, Jiang, Lianxin, Mo, Yang, Zhou, Mengyuan

arXiv.org Artificial IntelligenceApr-21-2021

Question answering from semi-structured tables can be seen as a semantic parsing task and is significant and practical for pushing the boundary of natural language understanding. Existing research mainly focuses on understanding contents from unstructured evidence, e.g., news, natural language sentences, and documents. The task of verification from structured evidence, such as tables, charts, and databases, is still less explored. This paper describes sattiy team's system in SemEval-2021 task 9: Statement Verification and Evidence Finding with Tables (SEM-TAB-FACT). This competition aims to verify statements and to find evidence from tables for scientific articles and to promote the proper interpretation of the surrounding article. In this paper, we exploited ensemble models of pre-trained language models over tables, TaPas and TaBERT, for Task A and adjust the result based on some rules extracted for Task B. Finally, in the leaderboard, we attain the F1 scores of 0.8496 and 0.7732 in Task A for the 2-way and 3-way evaluation, respectively, and the F1 score of 0.4856 in Task B.

evaluation, statement verification, training data, (15 more...)

arXiv.org Artificial Intelligence

2104.10366

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.57)

Add feedback

TaBERT: A new model for understanding queries over tabular data

#artificialintelligenceJul-17-2020, 22:55:28 GMT

TaBERT is the first model that has been pretrained to learn representations for both natural language sentences and tabular data. These sorts of representations are useful for natural language understanding tasks that involve joint reasoning over natural language sentences and tables. A representative example is semantic parsing over databases, where a natural language question (e.g., "Which country has the highest GDP?") is mapped to a program executable over database (DB) tables. This is the first pretraining approach across structured and unstructured domains, and it opens new possibilities regarding semantic parsing, where one of the key challenges has been understanding the structure of a DB table and how it aligns with a query. TaBERT has been trained using a corpus of 26 million tables and their associated English sentences.

artificial intelligence, natural language, tabert, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback