Goto

Collaborating Authors

 variational representation


Two Approaches to Direct Estimation of Riesz Representers

Bruns-Smith, David

arXiv.org Machine Learning

The Riesz representer is a central object in semiparametric statistics and debiased/doubly-robust estimation. Two literatures in econometrics have highlighted the role for directly estimating Riesz representers: the automatic debiased machine learning literature (as in Chernozhukov et al., 2022b), and an independent literature on sieve methods for conditional moment models (as in Chen et al., 2014). These two literatures solve distinct optimization problems that in the population both have the Riesz representer as their solution. We show that with unregularized or ridge-regularized linear, sieve, or RKHS models, the two resulting estimators are numerically equivalent. However, for other regularization schemes such as the Lasso, or more general machine learning function classes including neural networks, the estimators are not necessarily equivalent. In the latter case, the Chen et al. (2014) formulation yields a novel constrained optimization problem for directly estimating Riesz representers with machine learning. Drawing on results from Birrell et al. (2022), we conjecture that this approach may offer statistical advantages at the cost of greater computational complexity.



A Implementation of PS CD Algorithm

Neural Information Processing Systems

In this section, we provide two different ways to prove Theorem 2. The first one is more straightforward and directly differentiates through the term To solve this issue, we introduce the following variational representation: Lemma 1. With Jensen's inequality, we have: log null null As introduced in Equation (9) in Section 2.3, the divergence corresponding to the This is a direct consequence of Lemma 2. It can also be verified by checking the PS-CD Lemma 3. When 1 γ < 0, we have: S We first make the following assumption, which is similar to the one used in [4, 47]: Assumption 1. The assumption is typically easy to enforce in practice. In this section, we analyze the convergence property of the PS-CD algorithm presented in Algorithm 1. We have the following theorem that characterizes the convergence property of Algorithm 2: Theorem 5. Monte Carlo estimation will incur additional approximation error.



Optimal Anytime-Valid Tests for Composite Nulls

Shekhar, Shubhanshu

arXiv.org Machine Learning

We consider the problem of designing optimal level-$α$ power-one tests for composite nulls. Given a parameter $α\in (0,1)$ and a stream of $\mathcal{X}$-valued observations $\{X_n: n \geq 1\} \overset{i.i.d.}{\sim} P$, the goal is to design a level-$α$ power-one test $τ_α$ for the null $H_0: P \in \mathcal{P}_0 \subset \mathcal{P}(\mathcal{X})$. Prior works have shown that any such $τ_α$ must satisfy $\mathbb{E}_P[τ_α] \geq \tfrac{\log(1/α)}{γ^*(P, \mathcal{P}_0)}$, where $γ^*(P, \mathcal{P}_0)$ is the so-called $\mathrm{KL}_{\inf}$ or minimum divergence of $P$ to the null class. In this paper, our objective is to develop and analyze constructive schemes that match this lower bound as $α\downarrow 0$. We first consider the finite-alphabet case~($|\mathcal{X}| = m < \infty$), and show that a test based on \emph{universal} $e$-process~(formed by the ratio of a universal predictor and the running null MLE) is optimal in the above sense. The proof relies on a Donsker-Varadhan~(DV) based saddle-point representation of $\mathrm{KL}_{\inf}$, and an application of Sion's minimax theorem. This characterization motivates a general method for arbitrary $\mathcal{X}$: construct an $e$-process based on the empirical solutions to the saddle-point representation over a sufficiently rich class of test functions. We give sufficient conditions for the optimality of this test for compact convex nulls, and verify them for Hölder smooth density models. We end the paper with a discussion on the computational aspects of implementing our proposed tests in some practical settings.



A Unified Framework for Diffusion Model Unlearning with f-Divergence

Novello, Nicola, Fontana, Federico, Cinque, Luigi, Gunduz, Deniz, Tonello, Andrea M.

arXiv.org Artificial Intelligence

Machine unlearning aims to remove specific knowledge from a trained model. While diffusion models (DMs) have shown remarkable generative capabilities, existing unlearning methods for text-to-image (T2I) models often rely on minimizing the mean squared error (MSE) between the output distribution of a target and an anchor concept. We show that this MSE-based approach is a special case of a unified $f$-divergence-based framework, in which any $f$-divergence can be utilized. We analyze the benefits of using different $f$-divergences, that mainly impact the convergence properties of the algorithm and the quality of unlearning. The proposed unified framework offers a flexible paradigm that allows to select the optimal divergence for a specific application, balancing different trade-offs between aggressive unlearning and concept preservation.



A Implementation of PS CD Algorithm

Neural Information Processing Systems

In this section, we provide two different ways to prove Theorem 2. The first one is more straightforward and directly differentiates through the term To solve this issue, we introduce the following variational representation: Lemma 1. With Jensen's inequality, we have: log null null As introduced in Equation (9) in Section 2.3, the divergence corresponding to the This is a direct consequence of Lemma 2. It can also be verified by checking the PS-CD Lemma 3. When 1 γ < 0, we have: S We first make the following assumption, which is similar to the one used in [4, 47]: Assumption 1. The assumption is typically easy to enforce in practice. In this section, we analyze the convergence property of the PS-CD algorithm presented in Algorithm 1. We have the following theorem that characterizes the convergence property of Algorithm 2: Theorem 5. Monte Carlo estimation will incur additional approximation error.


Proximal optimal transport divergences

Baptista, Ricardo, Birmpa, Panagiota, Katsoulakis, Markos A., Rey-Bellet, Luc, Zhang, Benjamin J.

arXiv.org Machine Learning

We introduce proximal optimal transport divergence, a novel discrepancy measure that interpolates between information divergences and optimal transport distances via an infimal convolution formulation. This divergence provides a principled foundation for optimal transport proximals and proximal optimization methods frequently used in generative modeling. We explore its mathematical properties, including smoothness, boundedness, and computational tractability, and establish connections to primal-dual formulation and adversarial learning. Building on the Benamou-Brenier dynamic formulation of optimal transport cost, we also establish a dynamic formulation for proximal OT divergences. The resulting dynamic formulation is a first order mean-field game whose optimality conditions are governed by a pair of nonlinear partial differential equations, a backward Hamilton-Jacobi and a forward continuity partial differential equations.