AITopics | Fountalis, Ilias

Collaborating Authors

Fountalis, Ilias

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Agentic Schema Refinement

Rissaki, Agapi, Fountalis, Ilias, Vasiloglou, Nikolaos, Gatterbauer, Wolfgang

arXiv.org Artificial IntelligenceNov-25-2024

Understanding the meaning of data is crucial for performing data analysis, yet for the users to gain insight into the content and structure of their database, a tedious data exploration process is often required [2, 16]. A common industry practice taken on by specialists such as Knowledge Engineers is to explicitly construct an intermediate layer between the database and the user -- a semantic layer -- abstracting away certain details of the database schema in favor of clearer data semantics [3, 10]. In the era of Large Language Models (LLMs), industry practitioners and researchers attempt to circumvent this costly process using LLM-powered Natural Language Interfaces [4, 6, 12, 18, 19, 22]. The promise of such Text-to-SQL solutions is to allow users without technical expertise to seamlessly interact with databases. For example, a new company employee could effectively issue queries in natural language without programming expertise or even explicit knowledge of the database structure, e.g., knowing the names of entities or properties, the exact location of data sources, etc.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.07786

Country: North America > United States (0.47)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Government Relations & Public Policy (0.94)
Health & Medicine > Health Care Providers & Services > Reimbursement (0.69)
Government > Regional Government > North America Government > United States Government (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

AnnotatedTables: A Large Tabular Dataset with Language Model Annotations

Hu, Yaojie, Fountalis, Ilias, Tian, Jin, Vasiloglou, Nikolaos

arXiv.org Artificial IntelligenceJun-24-2024

Tabular data is ubiquitous in real-world applications and abundant on the web, yet its annotation has traditionally required human labor, posing a significant scalability bottleneck for tabular machine learning. Our methodology can successfully annotate a large amount of tabular data and can be flexibly steered to generate various types of annotations based on specific research objectives, as we demonstrate with SQL annotation and input-target column annotation as examples. As a result, we release AnnotatedTables, a collection of 32,119 databases with LLM-generated annotations. The dataset includes 405,616 valid SQL programs, making it the largest SQL dataset with associated tabular data that supports query execution. To further demonstrate the value of our methodology and dataset, we perform two follow-up research studies. 1) We investigate whether LLMs can translate SQL programs to Rel programs, a database language previously unknown to LLMs, while obtaining the same execution results. Using our Incremental Prompt Engineering methods based on execution feedback, we show that LLMs can produce adequate translations with few-shot learning. 2) We evaluate the performance of TabPFN, a recent neural tabular classifier trained on Bayesian priors, on 2,720 tables with input-target columns identified and annotated by LLMs. On average, TabPFN performs on par with the baseline AutoML method, though the relative performance can vary significantly from one data table to another, making both models viable for practical applications depending on the situation. Our findings underscore the potential of LLMs in automating the annotation of large volumes of diverse tabular data.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.16349

Country:

Asia > India (0.28)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.92)
Education (0.92)
Information Technology > Services (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.66)

Add feedback

CHORUS: Foundation Models for Unified Data Discovery and Exploration

Kayali, Moe, Lykov, Anton, Fountalis, Ilias, Vasiloglou, Nikolaos, Olteanu, Dan, Suciu, Dan

arXiv.org Artificial IntelligenceSep-26-2023

We apply foundation models to data discovery and exploration tasks. Foundation models are large language models (LLMs) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly applicable to the data discovery and data exploration domain. When carefully used, they have superior capability on three representative tasks: table-class detection, column-type annotation and join-column prediction. On all three tasks, we show that a foundation-model-based approach outperforms the task-specific models and so the state of the art. Further, our approach often surpasses human-expert task performance. We investigate the fundamental characteristics of this approach including generalizability to several foundation models, impact of non-determinism on the outputs and syntactic/semantic signals. All in all, this suggests a future direction in which disparate data management tasks can be unified under foundation models.

artificial intelligence, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2306.0961

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.80)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.53)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback