Skoutas, Dimitrios
MultiCast: Zero-Shot Multivariate Time Series Forecasting Using LLMs
Chatzigeorgakidis, Georgios, Lentzos, Konstantinos, Skoutas, Dimitrios
Predicting future values in multivariate time series is vital across various domains. This work explores the use of large language models (LLMs) for this task. However, LLMs typically handle one-dimensional data. We introduce MultiCast, a zero-shot LLM-based approach for multivariate time series forecasting. It allows LLMs to receive multivariate time series as input, through three novel token multiplexing solutions that effectively reduce dimensionality while preserving key repetitive patterns. Additionally, a quantization scheme helps LLMs to better learn these patterns, while significantly reducing token use for practical applications. We showcase the performance of our approach in terms of RMSE and execution time against state-of-the-art approaches on three real-world datasets.
Pre-trained Embeddings for Entity Resolution: An Experimental Analysis [Experiment, Analysis & Benchmark]
Zeakis, Alexandros, Papadakis, George, Skoutas, Dimitrios, Koubarakis, Manolis
Many recent works on Entity Resolution (ER) leverage Deep Learning techniques involving language models to improve effectiveness. This is applied to both main steps of ER, i.e., blocking and matching. Several pre-trained embeddings have been tested, with the most popular ones being fastText and variants of the BERT model. However, there is no detailed analysis of their pros and cons. To cover this gap, we perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets. First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors. Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method. Third, we conclude with their relative performance for both supervised and unsupervised matching. Our experimental results provide novel insights into the strengths and weaknesses of the main language models, facilitating researchers and practitioners to select the most suitable ones in practice.
INODE: Building an End-to-End Data Exploration System in Practice [Extended Vision]
Amer-Yahia, Sihem, Koutrika, Georgia, Bastian, Frederic, Belmpas, Theofilos, Braschler, Martin, Brunner, Ursin, Calvanese, Diego, Fabricius, Maximilian, Gkini, Orest, Kosten, Catherine, Lanti, Davide, Litke, Antonis, Lücke-Tieke, Hendrik, Massucci, Francesco Alessandro, de Farias, Tarcisio Mendes, Mosca, Alessandro, Multari, Francesco, Papadakis, Nikolaos, Papadopoulos, Dimitris, Patil, Yogendra, Personnaz, Aurélien, Rull, Guillem, Sima, Ana, Smith, Ellery, Skoutas, Dimitrios, Subramanian, Srividya, Xiao, Guohui, Stockinger, Kurt
A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE -- an end-to-end data exploration system -- that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Reearch, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus facilitating the user in discovering new insights. We demonstrate that our system is uniquely accessible to a wide range of users from larger scientific communities to the public. Finally, we briefly illustrate how this work paves the way for new research opportunities in DM.