AITopics | Parras, Juan

Collaborating Authors

Parras, Juan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DeCaFlow: A Deconfounding Causal Generative Model

Almodóvar, Alejandro, Javaloy, Adrián, Parras, Juan, Zazo, Santiago, Valera, Isabel

arXiv.org Artificial IntelligenceMar-19-2025

Causal generative models (CGMs) have recently emerged as capable approaches to simulate the causal mechanisms generating our observations, enabling causal inference. Unfortunately, existing approaches either are overly restrictive, assuming the absence of hidden confounders, or lack generality, being tailored to a particular query and graph. In this work, we introduce DeCaFlow, a CGM that accounts for hidden confounders in a single amortized training process using only observational data and the causal graph. Importantly, DeCaFlow can provably identify all causal queries with a valid adjustment set or sufficiently informative proxy variables. Remarkably, for the first time to our knowledge, we show that a confounded counterfactual query is identifiable, and thus solvable by DeCaFlow, as long as its interventional counterpart is as well. Our empirical results on diverse settings (including the Ecoli70 dataset, with 3 independent hidden confounders, tens of observed variables and hundreds of causal queries) show that DeCaFlow outperforms existing approaches, while demonstrating its out-of-the-box flexibility.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.15114

Country:

Europe (1.00)
North America > United States (0.93)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre:

Research Report > New Finding (0.92)
Research Report > Experimental Study (0.67)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.45)

Add feedback

Synthetic Tabular Data Validation: A Divergence-Based Approach

Apellániz, Patricia A., Jiménez, Ana, Galende, Borja Arroyo, Parras, Juan, Zazo, Santiago

arXiv.org Artificial IntelligenceMay-13-2024

The ever-increasing use of generative models in various fields where tabular data is used highlights the need for robust and standardized validation metrics to assess the similarity between real and synthetic data. Current methods lack a unified framework and rely on diverse and often inconclusive statistical measures. Divergences, which quantify discrepancies between data distributions, offer a promising avenue for validation. However, traditional approaches calculate divergences independently for each feature due to the complexity of joint distribution modeling. This paper addresses this challenge by proposing a novel approach that uses divergence estimation to overcome the limitations of marginal comparisons. Our core contribution lies in applying a divergence estimator to build a validation metric considering the joint distribution of real and synthetic data. We leverage a probabilistic classifier to approximate the density ratio between datasets, allowing the capture of complex relationships. We specifically calculate two divergences: the well-known Kullback-Leibler (KL) divergence and the Jensen-Shannon (JS) divergence. KL divergence offers an established use in the field, while JS divergence is symmetric and bounded, providing a reliable metric. The efficacy of this approach is demonstrated through a series of experiments with varying distribution complexities. The initial phase involves comparing estimated divergences with analytical solutions for simple distributions, setting a benchmark for accuracy. Finally, we validate our method on a real-world dataset and its corresponding synthetic counterpart, showcasing its effectiveness in practical applications. This research offers a significant contribution with applicability beyond tabular data and the potential to improve synthetic data validation in various fields.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2405.07822

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Data Science > Data Quality (0.72)

Add feedback

An improved tabular data generator with VAE-GMM integration

Apellániz, Patricia A., Parras, Juan, Zazo, Santiago

arXiv.org Artificial IntelligenceApr-12-2024

The rising use of machine learning in various fields requires robust methods to create synthetic tabular data. Data should preserve key characteristics while addressing data scarcity challenges. Current approaches based on Generative Adversarial Networks, such as the state-of-the-art CTGAN model, struggle with the complex structures inherent in tabular data. These data often contain both continuous and discrete features with non-Gaussian distributions. Therefore, we propose a novel Variational Autoencoder (VAE)-based model that addresses these limitations. Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture. This avoids the limitations imposed by assuming a strictly Gaussian latent space, allowing for a more accurate representation of the underlying data distribution during data generation. Furthermore, our model offers enhanced flexibility by allowing the use of various differentiable distributions for individual features, making it possible to handle both continuous and discrete data types. We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones, based on their resemblance and utility. This evaluation demonstrates significant outperformance against CTGAN and TVAE, establishing its potential as a valuable tool for generating synthetic tabular data in various domains, particularly in healthcare.

artificial intelligence, latent space, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2404.08434

Country: Europe (0.15)

Genre: Research Report > Promising Solution (0.69)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SAVAE: Leveraging the variational Bayes autoencoder for survival analysis

Apellániz, Patricia A., Parras, Juan, Zazo, Santiago

arXiv.org Machine LearningDec-22-2023

In recent years, there has been a significant transformation in medical research methodologies towards the adoption of Deep Learning (DL) techniques for predicting critical events, such as disease development and patient mortality. Despite their potential to handle complex data, practical applications in this domain remain limited, with most studies still relying on traditional statistical methods. Survival Analysis (SA), or time-to-event analysis, is an essential tool for studying specific events in various disciplines, not only in medicine but also in fields such as recommendation systems [1], employee retention [2], market modeling [3], and financial risk assessment [4]. According to the existing literature, the Cox proportional hazards model (Cox-PH) [5] is the dominant SA method that offers a semiparametric regression solution to the non-parametric Kaplan-Meier estimator problem [6]. Unlike the Kaplan-Meier method, which uses a single covariate, Cox-PH incorporates multiple covariates to predict event times and assess their impact on the hazard rate at specific time points. However, it is crucial to acknowledge that the Cox-PH model is built on certain strong assumptions. One of these is the proportional hazards assumption, which posits that different individuals have hazard functions that remain constant over time. Furthermore, the model assumes a linear relation between the natural logarithm of the relative hazard (the ratio of the hazard at time t to the baseline hazard) and the covariates. Furthermore, it assumes the absence of interactions among these covariates.

artificial intelligence, machine learning, savae, (18 more...)

arXiv.org Machine Learning

2312.14651

Country:

North America > United States (0.46)
Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Add feedback