AITopics | Cai, Changxiao

Plotting

Cai, Changxiao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dimension-Free Convergence of Diffusion Models for Approximate Gaussian Mixtures

Li, Gen, Cai, Changxiao, Wei, Yuting

arXiv.org Machine LearningApr-7-2025

Diffusion models are distinguished by their exceptional generative performance, particularly in producing high-quality samples through iterative denoising. While current theory suggests that the number of denoising steps required for accurate sample generation should scale linearly with data dimension, this does not reflect the practical efficiency of widely used algorithms like Denoising Diffusion Probabilistic Models (DDPMs). This paper investigates the effectiveness of diffusion models in sampling from complex high-dimensional distributions that can be well-approximated by Gaussian Mixture Models (GMMs). For these distributions, our main result shows that DDPM takes at most $\widetilde{O}(1/\varepsilon)$ iterations to attain an $\varepsilon$-accurate distribution in total variation (TV) distance, independent of both the ambient dimension $d$ and the number of components $K$, up to logarithmic factors. Furthermore, this result remains robust to score estimation errors. These findings highlight the remarkable effectiveness of diffusion models in high-dimensional settings given the universal approximation capability of GMMs, and provide theoretical insights into their practical success.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Machine Learning

2504.053

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)

Add feedback

Minimax Optimality of the Probability Flow ODE for Diffusion Models

Cai, Changxiao, Li, Gen

arXiv.org Machine LearningMar-12-2025

Score-based diffusion models have become a foundational paradigm for modern generative modeling, demonstrating exceptional capability in generating samples from complex high-dimensional distributions. Despite the dominant adoption of probability flow ODE-based samplers in practice due to their superior sampling efficiency and precision, rigorous statistical guarantees for these methods have remained elusive in the literature. This work develops the first end-to-end theoretical framework for deterministic ODE-based samplers that establishes near-minimax optimal guarantees under mild assumptions on target data distributions. Specifically, focusing on subgaussian distributions with $\beta$-H\"older smooth densities for $\beta\leq 2$, we propose a smooth regularized score estimator that simultaneously controls both the $L^2$ score error and the associated mean Jacobian error. Leveraging this estimator within a refined convergence analysis of the ODE-based sampling process, we demonstrate that the resulting sampler achieves the minimax rate in total variation distance, modulo logarithmic factors. Notably, our theory comprehensively accounts for all sources of error in the sampling process and does not require strong structural conditions such as density lower bounds or Lipschitz/smooth scores on target distributions, thereby covering a broad range of practical data distributions.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

arXiv.org Machine Learning

2503.09583

Country: North America > United States > Michigan (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Provable Acceleration for Diffusion Models under Minimal Assumptions

Li, Gen, Cai, Changxiao

arXiv.org Machine LearningNov-3-2024

While score-based diffusion models have achieved exceptional sampling quality, their sampling speeds are often limited by the high computational burden of score function evaluations. Despite the recent remarkable empirical advances in speeding up the score-based samplers, theoretical understanding of acceleration techniques remains largely limited. To bridge this gap, we propose a novel training-free acceleration scheme for stochastic samplers. Under minimal assumptions -- namely, $L^2$-accurate score estimates and a finite second-moment condition on the target distribution -- our accelerated sampler provably achieves $\varepsilon$-accuracy in total variation within $\widetilde{O}(d^{5/4}/\sqrt{\varepsilon})$ iterations, thereby significantly improving upon the $\widetilde{O}(d/\varepsilon)$ iteration complexity of standard score-based samplers. Notably, our convergence theory does not rely on restrictive assumptions on the target distribution or higher-order score estimation guarantees.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Machine Learning

2410.23285

Country: North America > United States > Michigan (0.14)

Genre:

Research Report (0.49)
Workflow (0.46)
Overview (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Minimax-optimal trust-aware multi-armed bandits

Cai, Changxiao, Zhang, Jiacheng

arXiv.org Machine LearningOct-4-2024

Multi-armed bandit (MAB) algorithms have achieved significant success in sequential decision-making applications, under the premise that humans perfectly implement the recommended policy. However, existing methods often overlook the crucial factor of human trust in learning algorithms. When trust is lacking, humans may deviate from the recommended policy, leading to undesired learning performance. Motivated by this gap, we study the trust-aware MAB problem by integrating a dynamic trust model into the standard MAB framework. Specifically, it assumes that the recommended and actually implemented policy differs depending on human trust, which in turn evolves with the quality of the recommended policy. We establish the minimax regret in the presence of the trust issue and demonstrate the suboptimality of vanilla MAB algorithms such as the upper confidence bound (UCB) algorithm. To overcome this limitation, we introduce a novel two-stage trust-aware procedure that provably attains near-optimal statistical guarantees. A simulation study is conducted to illustrate the benefits of our proposed algorithm when dealing with the trust issue.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Machine Learning

2410.03651

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.37)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Transfer Learning for Contextual Multi-armed Bandits

Cai, Changxiao, Cai, T. Tony, Li, Hongzhe

arXiv.org Artificial IntelligenceNov-22-2022

Motivated by a range of applications, we study in this paper the problem of transfer learning for nonparametric contextual multi-armed bandits under the covariate shift model, where we have data collected on source bandits before the start of the target bandit learning. The minimax rate of convergence for the cumulative regret is established and a novel transfer learning algorithm that attains the minimax regret is proposed. The results quantify the contribution of the data from the source domains for learning in the target domain in the context of nonparametric contextual multi-armed bandits. In view of the general impossibility of adaptation to unknown smoothness, we develop a data-driven algorithm that achieves near-optimal statistical guarantees (up to a logarithmic factor) while automatically adapting to the unknown parameters over a large collection of parameter spaces under an additional self-similarity assumption. A simulation study is carried out to illustrate the benefits of utilizing the data from the auxiliary source domains for learning in the target domain.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.12612

Genre: Research Report (0.63)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Minimax Estimation of Linear Functions of Eigenvectors in the Face of Small Eigen-Gaps

Li, Gen, Cai, Changxiao, Gu, Yuantao, Poor, H. Vincent, Chen, Yuxin

arXiv.org Machine LearningApr-7-2021

Eigenvector perturbation analysis plays a vital role in various statistical data science applications. A large body of prior works, however, focused on establishing $\ell_{2}$ eigenvector perturbation bounds, which are often highly inadequate in addressing tasks that rely on fine-grained behavior of an eigenvector. This paper makes progress on this by studying the perturbation of linear functions of an unknown eigenvector. Focusing on two fundamental problems -- matrix denoising and principal component analysis -- in the presence of Gaussian noise, we develop a suite of statistical theory that characterizes the perturbation of arbitrary linear functions of an unknown eigenvector. In order to mitigate a non-negligible bias issue inherent to the natural "plug-in" estimator, we develop de-biased estimators that (1) achieve minimax lower bounds for a family of scenarios (modulo some logarithmic factor), and (2) can be computed in a data-driven manner without sample splitting. Noteworthily, the proposed estimators are nearly minimax optimal even when the associated eigen-gap is substantially smaller than what is required in prior theory.

artificial intelligence, machine learning, probability, (16 more...)

arXiv.org Machine Learning

2104.03298

Country: North America > United States > Rhode Island (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning

Li, Gen, Cai, Changxiao, Chen, Yuxin, Gu, Yuantao, Wei, Yuting, Chi, Yuejie

arXiv.org Machine LearningFeb-12-2021

Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the synchronous setting (such that independent samples for all state-action pairs are drawn from a generative model in each iteration), substantial progress has been made recently towards understanding the sample efficiency of Q-learning. To yield an entrywise $\varepsilon$-accurate estimate of the optimal Q-function, state-of-the-art theory requires at least an order of $\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^5\varepsilon^{2}}$ samples for a $\gamma$-discounted infinite-horizon MDP with state space $\mathcal{S}$ and action space $\mathcal{A}$. In this work, we sharpen the sample complexity of synchronous Q-learning to an order of $\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^4\varepsilon^2}$ (up to some logarithmic factor) for any $0<\varepsilon <1$, leading to an order-wise improvement in terms of the effective horizon $\frac{1}{1-\gamma}$. Analogous results are derived for finite-horizon MDPs as well. Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage. A key ingredient of our analysis lies in the establishment of novel error decompositions and recursions, which might shed light on how to analyze finite-sample performance of other Q-learning variants.

artificial intelligence, q-learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2102.06548

Country:

North America > United States (0.67)
Asia (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)

Add feedback

Uncertainty quantification for nonconvex tensor completion: Confidence intervals, heteroscedasticity and optimality

Cai, Changxiao, Poor, H. Vincent, Chen, Yuxin

arXiv.org Machine LearningJun-15-2020

In many practical scenarios of interest, however, we do not have full access to a large-dimensional tensor of interest, as only a sampling of its entries are revealed to us; yet we would still wish to reliably infer all missing data. This task, commonly referred to as tensor completion, finds applications in numerous domains including medical imaging [SHKM14], visual data analysis [LMWY13], seismic data reconstruction [KSS13], to name just a few. In order to make meaningful inference about the unseen entries, additional information about the unknown tensor plays a pivotal role (otherwise one is in the position with fewer equations than unknowns). A common type of such prior information is low-rank structure, which hypothesizes that the unknown tensor is decomposable into the superposition of a few rank-one tensors. Substantial attempts have been made in the past few years to understand and tackle such low-rank tensor completion problems. To set the stage for a formal discussion, we formulate the problem as follows.

health & medicine, probability, survey article, (20 more...)

arXiv.org Machine Learning

2006.0858

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.92)

Industry: Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.86)

Add feedback

Subspace Estimation from Unbalanced and Incomplete Data Matrices: $\ell_{2,\infty}$ Statistical Guarantees

Cai, Changxiao, Li, Gen, Chi, Yuejie, Poor, H. Vincent, Chen, Yuxin

arXiv.org Machine LearningOct-9-2019

This paper is concerned with estimating the column space of an unknown low-rank matrix $\boldsymbol{A}^{\star}\in\mathbb{R}^{d_{1}\times d_{2}}$, given noisy and partial observations of its entries. There is no shortage of scenarios where the observations --- while being too noisy to support faithful recovery of the entire matrix --- still convey sufficient information to enable reliable estimation of the column space of interest. This is particularly evident and crucial for the highly unbalanced case where the column dimension $d_{2}$ far exceeds the row dimension $d_{1}$, which is the focal point of the current paper. We investigate an efficient spectral method, which operates upon the sample Gram matrix with diagonal deletion. We establish statistical guarantees for this method in terms of both $\ell_{2}$ and $\ell_{2,\infty}$ estimation accuracy, which improve upon prior results if $d_{2}$ is substantially larger than $d_{1}$. To illustrate the effectiveness of our findings, we develop consequences of our general theory for three applications of practical importance: (1) tensor completion from noisy data, (2) covariance estimation with missing data, and (3) community recovery in bipartite graphs. Our theory leads to improved performance guarantees for all three cases.

artificial intelligence, data mining, null 2, (19 more...)

arXiv.org Machine Learning

1910.04267

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback