Goto

Collaborating Authors

 Xenochristou, Maria


TabGLM: Tabular Graph Language Model for Learning Transferable Representations Through Multi-Modal Consistency Minimization

arXiv.org Artificial Intelligence

Handling heterogeneous data in tabular datasets poses a significant challenge for deep learning models. While attention-based architectures and self-supervised learning have achieved notable success, their application to tabular data remains less effective over linear and tree based models. Although several breakthroughs have been achieved by models which transform tables into uni-modal transformations like image, language and graph, these models often underperform in the presence of feature heterogeneity. To address this gap, we introduce T abGLM(T abular G raph Language M odel), a novel multi-modal architecture designed to model both structural and semantic information from a table. TabGLM transforms each row of a table into a fully connected graph and serialized text, which are then encoded using a graph neural network (GNN) and a text encoder, respectively. TabGLM's flexible graph-text pipeline efficiently processes heterogeneous datasets with significantly fewer parameters over existing Deep Learning approaches. Evaluations across 25 benchmark datasets demonstrate substantial performance gains, with TabGLM achieving an average AUC-ROC improvement of up to 5.56% over State-of-the-Art (SoT A) tabular learning methods. 1 Introduction Real-world applications ranging from predicting sales in e-commerce to diagnosing diseases in healthcare rely on tabular data. These datasets are oftentimes a mix of numerical, categorical, and text values, presenting a unique challenge for machine learning models. Traditional approaches (Breiman 2001; Chen and Guestrin 2016; Prokhorenkova et al. 2018) as well as some early Deep Learning (DL) models (Y oon et al. 2020; Arik and Pfister 2021; Gorishniy et al. 2021; Hollmann et al. 2023) convert textual data into numerical encodings modeling only structural features from an input table, leading to loss of semantic information.


MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation

arXiv.org Artificial Intelligence

Medical AI has tremendous potential to advance healthcare by supporting the evidence-based practice of medicine, personalizing patient treatment, reducing costs, and improving provider and patient experience. We argue that unlocking this potential requires a systematic way to measure the performance of medical AI models on large-scale heterogeneous data. To meet this need, we are building MedPerf, an open framework for benchmarking machine learning in the medical domain. MedPerf will enable federated evaluation in which models are securely distributed to different facilities for evaluation, thereby empowering healthcare organizations to assess and verify the performance of AI models in an efficient and human-supervised process, while prioritizing privacy. We describe the current challenges healthcare and AI communities face, the need for an open platform, the design philosophy of MedPerf, its current implementation status, and our roadmap. We call for researchers and organizations to join us in creating the MedPerf open benchmarking platform.