Goto

Collaborating Authors

 Yan, Yuling


Adaptivity and Convergence of Probability Flow ODEs in Diffusion Generative Models

arXiv.org Machine Learning

Score-based generative models, which transform noise into data by learning to reverse a diffusion process, have become a cornerstone of modern generative AI. This paper contributes to establishing theoretical guarantees for the probability flow ODE, a widely used diffusion-based sampler known for its practical efficiency. While a number of prior works address its general convergence theory, it remains unclear whether the probability flow ODE sampler can adapt to the low-dimensional structures commonly present in natural image data. We demonstrate that, with accurate score function estimation, the probability flow ODE sampler achieves a convergence rate of $O(k/T)$ in total variation distance (ignoring logarithmic factors), where $k$ is the intrinsic dimension of the target distribution and $T$ is the number of iterations. This dimension-free convergence rate improves upon existing results that scale with the typically much larger ambient dimension, highlighting the ability of the probability flow ODE sampler to exploit intrinsic low-dimensional structures in the target distribution for faster sampling.


$O(d/T)$ Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

arXiv.org Machine Learning

Score-based diffusion models, which generate new data by learning to reverse a diffusion process that perturbs data from the target distribution into noise, have achieved remarkable success across various generative tasks. Despite their superior empirical performance, existing theoretical guarantees are often constrained by stringent assumptions or suboptimal convergence rates. In this paper, we establish a fast convergence theory for a popular SDE-based sampler under minimal assumptions. Our analysis shows that, provided $\ell_{2}$-accurate estimates of the score functions, the total variation distance between the target and generated distributions is upper bounded by $O(d/T)$ (ignoring logarithmic factors), where $d$ is the data dimensionality and $T$ is the number of steps. This result holds for any target distribution with finite first-order moment. To our knowledge, this improves upon existing convergence theory for both the SDE-based sampler and another ODE-based sampler, while imposing minimal assumptions on the target data distribution and score estimates. This is achieved through a novel set of analytical tools that provides a fine-grained characterization of how the error propagates at each step of the reverse process.


When can weak latent factors be statistically inferred?

arXiv.org Machine Learning

The factor model, a pivotal tool for analyzing large panel data, has become a significant topic in finance and economics research (e.g., Chamberlain and Rothschild, 1983; Fama and French, 1993; Stock and Watson, 2002; Bai and Ng, 2002; Giglio and Xiu, 2021; Fan et al., 2021b). The estimation and inference for factor models are crucial in economic studies, particularly in areas like asset pricing and return forecasting. In the era of big data, the factor model has gained increased prominence in capturing the latent common structure for large panel data, where both the cross-sectional and temporal dimensions are ultra-high (see, e.g., recent surveys Bai and Wang, 2016; Fan et al., 2021a). Principal component analysis (PCA), known for its simplicity and effectiveness, is closely connected with the factor model and has long been a key research topic of interest in the econometric community (e.g., Stock and Watson, 2002; Bai and Ng, 2002; Bai, 2003; Onatski, 2012; Fan et al., 2013; Bai and Ng, 2013, 2023). As pointed out by Giglio et al. (2023), most theoretical guarantees for the PCA approach to factor analysis rely on the pervasiveness assumption (e.g., Bai and Ng, 2002; Bai, 2003). This assumption requires the signalto-noise ratio (SNR), which measures the factor strength relative to the noise level, to grow with the rate of N - the square root of the cross-sectional dimension. However, many real datasets in economics do not exhibit sufficiently strong factors to meet this pervasiveness assumption. When the SNR grows slower than N, the resulting model is often called the weak factor model (e.g., Onatski, 2009, 2010). Extensive research has been dedicated to the weak factor model (e.g., Onatski, 2012; Bailey et al., 2021; Freyaldenhoven, 2022; Uematsu and Yamagata, 2022a,b; Bai and Ng, 2023; Jiang et al., 2023; Choi and Yuan, 2024), among which


Adapting to Unknown Low-Dimensional Structures in Score-Based Diffusion Models

arXiv.org Machine Learning

This paper investigates score-based diffusion models when the underlying target distribution is concentrated on or near low-dimensional manifolds within the higher-dimensional space in which they formally reside, a common characteristic of natural image distributions. Despite previous efforts to understand the data generation process of diffusion models, existing theoretical support remains highly suboptimal in the presence of low-dimensional structure, which we strengthen in this paper. For the popular Denoising Diffusion Probabilistic Model (DDPM), we find that the dependency of the error incurred within each denoising step on the ambient dimension $d$ is in general unavoidable. We further identify a unique design of coefficients that yields a converges rate at the order of $O(k^{2}/\sqrt{T})$ (up to log factors), where $k$ is the intrinsic dimension of the target distribution and $T$ is the number of steps. This represents the first theoretical demonstration that the DDPM sampler can adapt to unknown low-dimensional structures in the target distribution, highlighting the critical importance of coefficient design. All of this is achieved by a novel set of analysis tools that characterize the algorithmic dynamics in a more deterministic manner.


Enhancing AI Diagnostics: Autonomous Lesion Masking via Semi-Supervised Deep Learning

arXiv.org Artificial Intelligence

This study presents an unsupervised domain adaptation method aimed at autonomously generating image masks outlining regions of interest (ROIs) for differentiating breast lesions in breast ultrasound (US) imaging. Our semi-supervised learning approach utilizes a primitive model trained on a small public breast US dataset with true annotations. This model is then iteratively refined for the domain adaptation task, generating pseudo-masks for our private, unannotated breast US dataset. The dataset, twice the size of the public one, exhibits considerable variability in image acquisition perspectives and demographic representation, posing a domain-shift challenge. Unlike typical domain adversarial training, we employ downstream classification outcomes as a benchmark to guide the updating of pseudo-masks in subsequent iterations. We found the classification precision to be highly correlated with the completeness of the generated ROIs, which promotes the explainability of the deep learning classification model. Preliminary findings demonstrate the efficacy and reliability of this approach in streamlining the ROI annotation process, thereby enhancing the classification and localization of breast lesions for more precise and interpretable diagnoses.


Entrywise Inference for Causal Panel Data: A Simple and Instance-Optimal Approach

arXiv.org Machine Learning

In causal inference with panel data under staggered adoption, the goal is to estimate and derive confidence intervals for potential outcomes and treatment effects. We propose a computationally efficient procedure, involving only simple matrix algebra and singular value decomposition. We derive non-asymptotic bounds on the entrywise error, establishing its proximity to a suitably scaled Gaussian variable. Despite its simplicity, our procedure turns out to be instance-optimal, in that our theoretical scaling matches a local instance-wise lower bound derived via a Bayesian Cram\'{e}r-Rao argument. Using our insights, we develop a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage guarantees. Our analysis is based on a general inferential toolbox for the SVD algorithm applied to the matrix denoising model, which might be of independent interest.


The Isotonic Mechanism for Exponential Family Estimation

arXiv.org Artificial Intelligence

In 2023, the International Conference on Machine Learning (ICML) required authors with multiple submissions to rank their submissions based on perceived quality. In this paper, we aim to employ these author-specified rankings to enhance peer review in machine learning and artificial intelligence conferences by extending the Isotonic Mechanism to exponential family distributions. This mechanism generates adjusted scores that closely align with the original scores while adhering to author-specified rankings. Despite its applicability to a broad spectrum of exponential family distributions, implementing this mechanism does not require knowledge of the specific distribution form. We demonstrate that an author is incentivized to provide accurate rankings when her utility takes the form of a convex additive function of the adjusted review scores. For a certain subclass of exponential family distributions, we prove that the author reports truthfully only if the question involves only pairwise comparisons between her submissions, thus indicating the optimality of ranking in truthful information elicitation. Moreover, we show that the adjusted scores improve dramatically the estimation accuracy compared to the original scores and achieve nearly minimax optimality when the ground-truth scores have bounded total variation. We conclude the paper by presenting experiments conducted on the ICML 2023 ranking data, which show significant estimation gain using the Isotonic Mechanism.


Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

arXiv.org Artificial Intelligence

This paper studies reward-agnostic exploration in reinforcement learning (RL) -- a scenario where the learner is unware of the reward functions during the exploration stage -- and designs an algorithm that improves over the state of the art. More precisely, consider a finite-horizon non-stationary Markov decision process with $S$ states, $A$ actions, and horizon length $H$, and suppose that there are no more than a polynomial number of given reward functions of interest. By collecting an order of \begin{align*} \frac{SAH^3}{\varepsilon^2} \text{ sample episodes (up to log factor)} \end{align*} without guidance of the reward information, our algorithm is able to find $\varepsilon$-optimal policies for all these reward functions, provided that $\varepsilon$ is sufficiently small. This forms the first reward-agnostic exploration scheme in this context that achieves provable minimax optimality. Furthermore, once the sample size exceeds $\frac{S^2AH^3}{\varepsilon^2}$ episodes (up to log factor), our algorithm is able to yield $\varepsilon$ accuracy for arbitrarily many reward functions (even when they are adversarially designed), a task commonly dubbed as ``reward-free exploration.'' The novelty of our algorithm design draws on insights from offline RL: the exploration scheme attempts to maximize a critical reward-agnostic quantity that dictates the performance of offline RL, while the policy learning paradigm leverages ideas from sample-optimal offline RL paradigms.


Learning Gaussian Mixtures Using the Wasserstein-Fisher-Rao Gradient Flow

arXiv.org Artificial Intelligence

Gaussian mixture models form a flexible and expressive parametric family of distributions that has found applications in a wide variety of applications. Unfortunately, fitting these models to data is a notoriously hard problem from a computational perspective. Currently, only moment-based methods enjoy theoretical guarantees while likelihood-based methods are dominated by heuristics such as Expectation-Maximization that are known to fail in simple examples. In this work, we propose a new algorithm to compute the nonparametric maximum likelihood estimator (NPMLE) in a Gaussian mixture model. Our method is based on gradient descent over the space of probability measures equipped with the Wasserstein-Fisher-Rao geometry for which we establish convergence guarantees. In practice, it can be approximated using an interacting particle system where the weight and location of particles are updated alternately. We conduct extensive numerical experiments to confirm the effectiveness of the proposed algorithm compared not only to classical benchmarks but also to similar gradient descent algorithms with respect to simpler geometries. In particular, these simulations illustrate the benefit of updating both weight and location of the interacting particles.


Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games

arXiv.org Machine Learning

This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data. Specifically, consider a $\gamma$-discounted infinite-horizon Markov game with $S$ states, where the max-player has $A$ actions and the min-player has $B$ actions. We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game -- that provably finds an $\varepsilon$-approximate Nash equilibrium with a sample complexity no larger than $\frac{C_{\mathsf{clipped}}^{\star}S(A+B)}{(1-\gamma)^{3}\varepsilon^{2}}$ (up to some log factor). Here, $C_{\mathsf{clipped}}^{\star}$ is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-\`a-vis the target data), and the target accuracy $\varepsilon$ can be any value within $\big(0,\frac{1}{1-\gamma}\big]$. Our sample complexity bound strengthens prior art by a factor of $\min\{A,B\}$, achieving minimax optimality for the entire $\varepsilon$-range. An appealing feature of our result lies in algorithmic simplicity, which reveals the unnecessity of variance reduction and sample splitting in achieving sample optimality.