AITopics | Gretton, Arthur

Collaborating Authors

Gretton, Arthur

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

(De)-regularized Maximum Mean Discrepancy Gradient Flow

Chen, Zonghao, Mustafi, Aratrika, Glaser, Pierre, Korba, Anna, Gretton, Arthur, Sriperumbudur, Bharath K.

arXiv.org Machine LearningSep-23-2024

We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow. Existing gradient flows that transport samples from source distribution to target distribution with only target samples, either lack tractable numerical implementation ($f$-divergence flows) or require strong assumptions, and modifications such as noise injection, to ensure convergence (Maximum Mean Discrepancy flows). In contrast, DrMMD flow can simultaneously (i) guarantee near-global convergence for a broad class of targets in both continuous and discrete time, and (ii) be implemented in closed form using only samples. The former is achieved by leveraging the connection between the DrMMD and the $\chi^2$-divergence, while the latter comes by treating DrMMD as MMD with a de-regularized kernel. Our numerical scheme uses an adaptive de-regularization schedule throughout the flow to optimally trade off between discretization errors and deviations from the $\chi^2$ regime. The potential application of the DrMMD flow is demonstrated across several numerical experiments, including a large-scale setting of training student/teacher networks.

artificial intelligence, drmmd, machine learning, (17 more...)

arXiv.org Machine Learning

2409.1498

Country: North America > United States (0.67)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)

Add feedback

Spectral Representation for Causal Estimation with Hidden Confounders

Ren, Tongzheng, Sun, Haotian, Moulin, Antoine, Gretton, Arthur, Dai, Bo

arXiv.org Machine LearningJul-15-2024

We address the problem of causal effect estimation where hidden confounders are present, with a focus on two settings: instrumental variable regression with additional observed confounders, and proxy causal learning. Our approach uses a singular value decomposition of a conditional expectation operator, followed by a saddle-point optimization problem, which, in the context of IV regression, can be thought of as a neural net generalization of the seminal approach due to Darolles et al. [2011]. Saddle-point formulations have gathered considerable attention recently, as they can avoid double sampling bias and are amenable to modern function approximation methods. We provide experimental validation in various settings, and show that our approach outperforms existing methods on common benchmarks.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Machine Learning

2407.10448

Country: Asia (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Mind the Graph When Balancing Data for Fairness or Robustness

Schrouff, Jessica, Bellot, Alexis, Rannen-Triki, Amal, Malek, Alan, Albuquerque, Isabela, Gretton, Arthur, D'Amour, Alexander, Chiappa, Silvia

arXiv.org Artificial IntelligenceJun-25-2024

Failures of fairness or robustness in machine learning predictive settings can be due to undesired dependencies between covariates, outcomes and auxiliary factors of variation. A common strategy to mitigate these failures is data balancing, which attempts to remove those undesired dependencies. In this work, we define conditions on the training distribution for data balancing to lead to fair or robust models. Our results display that, in many cases, the balanced distribution does not correspond to selectively removing the undesired dependencies in a causal graph of the task, leading to multiple failure modes and even interference with other mitigation techniques such as regularization. Overall, our results highlight the importance of taking the causal graph into account before performing data balancing.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.17433

Country:

North America > United States (0.68)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Conditional Bayesian Quadrature

Chen, Zonghao, Naslidnyk, Masha, Gretton, Arthur, Briol, François-Xavier

arXiv.org Machine LearningJun-24-2024

We propose a novel approach for estimating conditional or parametric expectations in the setting where obtaining samples or evaluating integrands is costly. Through the framework of probabilistic numerical methods (such as Bayesian quadrature), our novel approach allows to incorporates prior information about the integrands especially the prior smoothness knowledge about the integrands and the conditional expectation. As a result, our approach provides a way of quantifying uncertainty and leads to a fast convergence rate, which is confirmed both theoretically and empirically on challenging tasks in Bayesian sensitivity analysis, computational finance and decision making under uncertainty.

artificial intelligence, machine learning, mathematics of computing, (19 more...)

arXiv.org Machine Learning

2406.1653

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Banking & Finance (0.65)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms

Meunier, Dimitri, Shen, Zikai, Mollenhauer, Mattes, Gretton, Arthur, Li, Zhu

arXiv.org Machine LearningMay-23-2024

We study theoretical properties of a broad class of regularized algorithms with vector-valued output. These spectral algorithms include kernel ridge regression, kernel principal component regression, various implementations of gradient descent and many more. Our contributions are twofold. First, we rigorously confirm the so-called saturation effect for ridge regression with vector-valued output by deriving a novel lower bound on learning rates; this bound is shown to be suboptimal when the smoothness of the regression function exceeds a certain level. Second, we present the upper bound for the finite sample risk general vector-valued spectral algorithms, applicable to both well-specified and misspecified scenarios (where the true regression function lies outside of the hypothesis space) which is minimax optimal in various regimes. All of our results explicitly allow the case of infinite-dimensional output variables, proving consistency of recent practical applications.

artificial intelligence, machine learning, probability, (16 more...)

arXiv.org Machine Learning

2405.14778

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Deep MMD Gradient Flow without adversarial training

Galashov, Alexandre, de Bortoli, Valentin, Gretton, Arthur

arXiv.org Artificial IntelligenceMay-10-2024

One challenge that arises when applying these models in practice is that the Stein score (that is, the gradient log We propose a gradient flow procedure for generative of the current noisy density) becomes ill-behaved near the modeling by transporting particles from an data distribution (Yang et al., 2023): the diffusion process initial source distribution to a target distribution, needs to be slowed down at this point, which incurs a large where the gradient field on the particles is given number of sampling steps near the data distribution. Indeed, by a noise-adaptive Wasserstein Gradient of the if the manifold hypothesis holds (Tenenbaum et al., 2000; Maximum Mean Discrepancy (MMD). The noiseadaptive Fefferman et al., 2016; Brown et al., 2022) and the data MMD is trained on data distributions corrupted is supported on a lower dimensional space, it is expected by increasing levels of noise, obtained via that the score will explode for noise levels close to zero, a forward diffusion process, as commonly used to ensure that the backward process concentrates on this in denoising diffusion probabilistic models. The lower dimensional manifold (Bortoli, 2023; Pidstrigach, result is a generalization of MMD Gradient Flow, 2022; Chen et al., 2022). While strategies exist to mitigate which we call Diffusion-MMD-Gradient Flow or these issues, they trade-off the quality of the output against DMMD. The divergence training procedure is inference speed, see for instance (Song et al., 2023; Xu et al., related to discriminator training in Generative Adversarial 2023; Sauer et al., 2023). Networks (GAN), but does not require adversarial training. We obtain competitive empirical Generative Adversarial Networks (GANs) (Goodfellow performance in unconditional image generation et al., 2014) represent an alternative popular generative modelling on CIFAR10, MNIST, CELEB-A (64 x64) framework (Brock et al., 2019; Karras et al., 2020a).

artificial intelligence, discriminator, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.0678

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Proxy Methods for Domain Adaptation

Tsai, Katherine, Pfohl, Stephen R., Salaudeen, Olawale, Chiou, Nicole, Kusner, Matt J., D'Amour, Alexander, Koyejo, Sanmi, Gretton, Arthur

arXiv.org Machine LearningMar-12-2024

We study the problem of domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels. In this setting, neither the covariate shift nor the label shift assumptions apply. Our approach to adaptation employs proximal causal learning, a technique for estimating causal effects in settings where proxies of unobserved confounders are available. We demonstrate that proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables. We consider two settings, (i) Concept Bottleneck: an additional ''concept'' variable is observed that mediates the relationship between the covariates and labels; (ii) Multi-domain: training data from multiple source domains is available, where each source domain exhibits a different distribution over the latent confounder. We develop a two-stage kernel estimation approach to adapt to complex distribution shifts in both settings. In our experiments, we show that our approach outperforms other methods, notably those which explicitly recover the latent confounder.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2403.07442

Country:

North America > United States > Illinois (0.14)
Europe > Netherlands (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Nuclear Medicine (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.68)
Health & Medicine > Epidemiology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.92)

Add feedback

Practical Kernel Tests of Conditional Independence

Pogodin, Roman, Schrab, Antonin, Li, Yazhe, Sutherland, Danica J., Gretton, Arthur

arXiv.org Artificial IntelligenceFeb-20-2024

We describe a data-efficient, kernel-based approach to statistical testing of conditional independence. A major challenge of conditional independence testing, absent in tests of unconditional independence, is to obtain the correct test level (the specified upper bound on the rate of false positives), while still attaining competitive test power. Excess false positives arise due to bias in the test statistic, which is obtained using nonparametric kernel ridge regression. We propose three methods for bias control to correct the test level, based on data splitting, auxiliary data, and (where possible) simpler function classes. We show these combined strategies are effective both for synthetic and real-world data.

artificial intelligence, kernel, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2402.13196

Country:

North America > United States (0.46)
North America > Canada > Quebec (0.14)

Genre: Research Report (1.00)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

A Distributional Analogue to the Successor Representation

Wiltzer, Harley, Farebrother, Jesse, Gretton, Arthur, Tang, Yunhao, Barreto, André, Dabney, Will, Bellemare, Marc G., Rowland, Mark

arXiv.org Machine LearningFeb-13-2024

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2402.0853

Country:

North America > Canada > Quebec (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Controlling Moments with Kernel Stein Discrepancies

Kanagawa, Heishiro, Barp, Alessandro, Gretton, Arthur, Mackey, Lester

arXiv.org Machine LearningJan-4-2024

Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant. Notable applications include the diagnosis of approximate MCMC samplers and goodness-of-fit tests for unnormalized statistical models. The present work analyzes the convergence control properties of KSDs. We first show that standard KSDs used for weak convergence control fail to control moment convergence. To address this limitation, we next provide sufficient conditions under which alternative diffusion KSDs control both moment and weak convergence. As an immediate consequence we develop, for each $q > 0$, the first KSDs known to exactly characterize $q$-Wasserstein convergence.

artificial intelligence, convergence, machine learning, (17 more...)

arXiv.org Machine Learning

2211.05408

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback