Goto

Collaborating Authors

 Phan, Dzung


TabularFM: An Open Framework For Tabular Foundational Models

arXiv.org Artificial Intelligence

Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured data, such as text and images, or semi-structured data, like time-series. However, there has been limited attention to structured data, such as tabular data, which, despite its prevalence, remains under-studied due to a lack of clean datasets and insufficient research on the transferability of FMs for various tabular data tasks. In response to this gap, we introduce a framework called TabularFM, which incorporates state-of-the-art methods for developing FMs specifically for tabular data. This includes variations of neural architectures such as GANs, VAEs, and Transformers. We have curated a million of tabular datasets and released cleaned versions to facilitate the development of tabular FMs. We pretrained FMs on this curated data, benchmarked various learning methods on these datasets, and released the pretrained models along with leaderboards for future comparative studies. Our fully open-sourced system provides a comprehensive analysis of the transferability of tabular FMs. By releasing these datasets, pretrained models, and leaderboards, we aim to enhance the validity and usability of tabular FMs in the near future.


An End-to-End Time Series Model for Simultaneous Imputation and Forecast

arXiv.org Artificial Intelligence

Learning the complex structure of multivariate time series has been one of the major interests across many application domains, including economics, transportation, manufacturing [Fortuin et al., 2020, Wu et al., 2021, Li et al., 2019, Zhou et al., 2021]. While there has been much progress in the data-driven learning and processing complex time series, it still remains as a challenging topic, in particular, when the data is corrupted [Cao et al., 2018, Kreindler and Lumsden, 2006, Yoon et al., 2018, Du et al., 2022]. In this paper, we consider the forecasting task which aims to make prediction of future values using historical data that may contain missing values. In addition, for many industrial problems, the time series features can be in two categories: auxiliary features (X) that provide information about the state of a system and target variables (Y) that depends on the auxiliary features and may convey valuable information. For example, in the operation of a chemical reactor, the auxiliary features include temperature, pressure and concentration of chemicals observed through a sensor network, while the target variable may include the quality of the material and throughput. We are interested in the time series problem where the data set consists of X and Y. In general, X is more readily available, as it is obtained from a sensor network, while Y may be temporally sparse since it may be expensive or difficult to collect the data. This so-called soft sensor problem has been of interest in many industrial applications [Shardt et al., 2015, Yuan et al., 2021].


A Scale Invariant Flatness Measure for Deep Network Minima

arXiv.org Machine Learning

It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of sharpness/flatness are not invariant to rescaling of the network parameters, corresponding to the same function. This means that the measure of flatness/sharpness can be made as small or as large as possible through rescaling, rendering the quantitative measures meaningless. In this paper we show that for deep networks with positively homogenous activations, these rescalings constitute equivalence relations, and that these equivalence relations induce a quotient manifold structure in the parameter space. Using this manifold structure and an appropriate metric, we propose a Hessian-based measure for flatness that is invariant to rescaling. We use this new measure to confirm the proposition that Large-Batch SGD minima are indeed sharper than Small-Batch SGD minima.