AITopics

2504.053

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)

arXiv.org Artificial IntelligenceMar-16-2025

Improving Diffusion-based Inverse Algorithms under Few-Step Constraint via Learnable Linear Extrapolation

Zhang, Jiawei, Liu, Ziyuan, Yan, Leon, Li, Gen, Gu, Yuantao

Diffusion models have demonstrated remarkable performance in modeling complex data priors, catalyzing their widespread adoption in solving various inverse problems. However, the inherently iterative nature of diffusion-based inverse algorithms often requires hundreds to thousands of steps, with performance degradation occurring under fewer steps which limits their practical applicability. While high-order diffusion ODE solvers have been extensively explored for efficient diffusion sampling without observations, their application to inverse problems remains underexplored due to the diverse forms of inverse algorithms and their need for repeated trajectory correction based on observations. To address this gap, we first introduce a canonical form that decomposes existing diffusion-based inverse algorithms into three modules to unify their analysis. Inspired by the linear subspace search strategy in the design of high-order diffusion ODE solvers, we propose the Learnable Linear Extrapolation (LLE) method, a lightweight approach that universally enhances the performance of any diffusion-based inverse algorithm that fits the proposed canonical form. Extensive experiments demonstrate consistent improvements of the proposed LLE method across multiple algorithms and tasks, indicating its potential for more efficient solutions and boosted performance of diffusion-based inverse algorithms with limited steps. Codes for reproducing our experiments are available at https://github.com/weigerzan/LLE_inverse_problem.

algorithm, artificial intelligence, machine learning, (17 more...)

2503.10103

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

arXiv.org Machine LearningMar-12-2025

Minimax Optimality of the Probability Flow ODE for Diffusion Models

Cai, Changxiao, Li, Gen

Score-based diffusion models have become a foundational paradigm for modern generative modeling, demonstrating exceptional capability in generating samples from complex high-dimensional distributions. Despite the dominant adoption of probability flow ODE-based samplers in practice due to their superior sampling efficiency and precision, rigorous statistical guarantees for these methods have remained elusive in the literature. This work develops the first end-to-end theoretical framework for deterministic ODE-based samplers that establishes near-minimax optimal guarantees under mild assumptions on target data distributions. Specifically, focusing on subgaussian distributions with $\beta$-H\"older smooth densities for $\beta\leq 2$, we propose a smooth regularized score estimator that simultaneously controls both the $L^2$ score error and the associated mean Jacobian error. Leveraging this estimator within a refined convergence analysis of the ODE-based sampling process, we demonstrate that the resulting sampler achieves the minimax rate in total variation distance, modulo logarithmic factors. Notably, our theory comprehensively accounts for all sources of error in the sampling process and does not require strong structural conditions such as density lower bounds or Lipschitz/smooth scores on target distributions, thereby covering a broad range of practical data distributions.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

2503.09583

Country: North America > United States > Michigan (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Artificial IntelligenceMar-9-2025

Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing

Xiao, Yang, Lu, Wang, Ji, Jie, Ye, Ruimeng, Li, Gen, Ma, Xiaolong, Hui, Bo

The design of artificial neural networks (ANNs) is inspired by the structure of the human brain, and in turn, ANNs offer a potential means to interpret and understand brain signals. Existing methods primarily align brain signals with real-world signals using Mean Squared Error (MSE), which solely focuses on local point-wise alignment, and ignores global matching, leading to coarse interpretations and inaccuracies in brain signal decoding. In this paper, we address these issues through optimal transport (OT) and theoretically demonstrate why OT provides a more effective alignment strategy than MSE. Specifically, we construct a transport plan between brain voxel embeddings and image embeddings, enabling more precise matching. By controlling the amount of transport, we mitigate the influence of redundant information. We apply our alignment model directly to the Brain Captioning task by feeding brain siginals into a large language model (LLM) instead of images. Our approach achieves state-of-the-art performance across ten evaluation metrics, surpassing the previous best method by an average of 6.11\% in single-subject training and 3.81\% in cross-subject training. Additionally, we have uncovered several insightful conclusions that align with existing brain research. We unveil the redundancy and synergy of brain information processing through region masking and data dimensionality reduction visualization experiments. We believe our approach paves the way for a more precise understanding of brain signals in the future. The code is available soon.

information, large language model, machine learning, (19 more...)

2503.10663

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningFeb-25-2025

Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data

Wu, Rong, Chen, Ziqi, Li, Gen, Shu, Hai

Motivation: Biomedical studies increasingly produce multi-view high-dimensional datasets (e.g., multi-omics) that demand integrative analysis. Existing canonical correlation analysis (CCA) and generalized CCA methods address at most two of the following three key aspects simultaneously: (i) nonlinear dependence, (ii) sparsity for variable selection, and (iii) generalization to more than two data views. There is a pressing need for CCA methods that integrate all three aspects to effectively analyze multi-view high-dimensional data. Results: We propose three nonlinear, sparse, generalized CCA methods, HSIC-SGCCA, SA-KGCCA, and TS-KGCCA, for variable selection in multi-view high-dimensional data. These methods extend existing SCCA-HSIC, SA-KCCA, and TS-KCCA from two-view to multi-view settings. While SA-KGCCA and TS-KGCCA yield multi-convex optimization problems solved via block coordinate descent, HSIC-SGCCA introduces a necessary unit-variance constraint previously ignored in SCCA-HSIC, resulting in a nonconvex, non-multiconvex problem. We efficiently address this challenge by integrating the block prox-linear method with the linearized alternating direction method of multipliers. Simulations and TCGA-BRCA data analysis demonstrate that HSIC-SGCCA outperforms competing methods in multi-view variable selection. Availability and implementation: Code is available at https://github.com/Rows21/NSGCCA.

artificial intelligence, canonical correlation analysis, machine learning, (17 more...)

2502.18756

Country:

North America > United States > Michigan (0.28)
North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceDec-27-2024

Minimax-Optimal Multi-Agent Robust Reinforcement Learning

Jiao, Yuchen, Li, Gen

The rapidly evolving field of multi-agent reinforcement learning (MARL), also referred to as Markov games (MGs) (Littman, 1994; Shapley, 1953), explores how a group of agents interacts in a shared, dynamic environment to maximize their individual expected cumulative rewards (Zhang et al., 2020a; Lanctot et al., 2019; Silver et al., 2017; Vinyals et al., 2019). This area has found wide applications in fields such as ecosystem management (Fang et al., 2015), strategic decision-making in board games (Silver et al., 2017), management science (Saloner, 1991), and autonomous driving (Zhou et al., 2020). However, in real-world applications, environmental uncertainties--stemming from factors such as system noise, model misalignment, and the sim-to-real gap--can significantly alter both the qualitative outcomes of the game and the cumulatiev rewards that agents receive (Slumbers et al., 2023). It has been demonstrated that when solutions learned in a simulated environment are applied, even a small deviation in the deployed environment from the expected model can result in catastrophic performance drops for one or more agents (Shi et al., 2024c; Balaji et al., 2019; Yeh et al., 2021; Zeng et al., 2022; Zhang et al., 2020b). These challenges motivate the study of robust Markov games (RMGs), which assume that each agent aims to maximize its worst-case cumulative reward in an environment where the transition model is constrained by an uncertainty set centered around an unknown nominal model. Given the competitive nature of the game, the objective of RMGs is to reach an equilibrium where no agent has an incentive to unilaterally change its policy to increase its own payoff. A classical type of equilibrium is the robust Nash equilibrium (NE) (Nash Jr, 1950), where each agent's policy is independent, and no agent can improve its worst-case performance by deviating from its current strategy. Due to the high computational cost of solving robust NEs, especially in games with more than two agents, this concept is often relaxed to the robust coarse correlated equilibrium (CCE), where agents' policies may be correlated (Moulin & Vial, 1978). In the context of RMGs, achieving equilibrium with minimal samples is of particular interest, as data is often limited in practical applications.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2412.19873

Country: Asia > China (0.28)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceNov-25-2024

Condense, Don't Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning

Cao, Mingyu, Li, Gen, Ji, Jie, Zhang, Jiaqi, Ma, Xiaolong, Liu, Shiwei, Yin, Lu

Mixture-of-Experts (MOE) has garnered significant attention for their ability to scale up neural networks while utilizing the same or even fewer active parameters. However, MoE does not relieve the massive memory requirements of networks, which limits their practicality in real-world applications, especially in the era of large language models (LLMs). While recent work explores the possibility of removing entire layers of MoE to reduce memory, the performance degradation is still notable. In this paper, we propose Condense-MoE (CD-MoE} that, instead of dropping the entire MoE layer, condenses the big, sparse MoE layer into a small but dense layer with only a few experts that are activated for all tokens. Our approach is specifically designed for fine-grained MoE with shared experts, where Feed-Forward Networks are split into many small experts, with certain experts isolated to serve as shared experts that are always activated. We demonstrate the effectiveness of our method across multiple MoE models such as DeepSeekMoE and QwenMoE on various benchmarks. Specifically, for the DeepSeekMoE-16B model, our approach maintains nearly 90% of the average accuracy while reducing memory usage by 30% and enhancing inference speed by 30%. Moreover, we show that with lightweight expert fine-tuning, the pruned model can achieve further improvements on specific tasks. Our code are available at https://github.com/duterscmy/CD-MoE/tree/main.

large language model, machine learning, natural language, (18 more...)

2412.00069

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

arXiv.org Machine LearningNov-3-2024

Provable Acceleration for Diffusion Models under Minimal Assumptions

Li, Gen, Cai, Changxiao

While score-based diffusion models have achieved exceptional sampling quality, their sampling speeds are often limited by the high computational burden of score function evaluations. Despite the recent remarkable empirical advances in speeding up the score-based samplers, theoretical understanding of acceleration techniques remains largely limited. To bridge this gap, we propose a novel training-free acceleration scheme for stochastic samplers. Under minimal assumptions -- namely, $L^2$-accurate score estimates and a finite second-moment condition on the target distribution -- our accelerated sampler provably achieves $\varepsilon$-accuracy in total variation within $\widetilde{O}(d^{5/4}/\sqrt{\varepsilon})$ iterations, thereby significantly improving upon the $\widetilde{O}(d/\varepsilon)$ iteration complexity of standard score-based samplers. Notably, our convergence theory does not rely on restrictive assumptions on the target distribution or higher-order score estimation guarantees.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

2410.23285

Country: North America > United States > Michigan (0.14)

Genre:

Research Report (0.49)
Workflow (0.46)
Overview (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Machine LearningOct-30-2024

BAMITA: Bayesian Multiple Imputation for Tensor Arrays

Jiang, Ziren, Li, Gen, Lock, Eric F.

Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at https://github.com/lockEF/MultiwayImputation .

artificial intelligence, bayesian inference, machine learning, (19 more...)

2410.23412

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Machine LearningOct-21-2024

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

Wu, Weichen, Li, Gen, Wei, Yuting, Rinaldo, Alessandro

Statistical inference tasks, such as constructing confidence regions or simultaneous confidence intervals, are often addressed by deriving distributional theory such as central limit theorems (CLTs) for the estimator in use. Due to the high dimensionality of modern science and engineering applications, there has been a surge of interests in deriving convergence results that are valid in a finite-sample manner. In Reinforcement Learning (RL), a discipline that underpins many recent machine learning breakthroughs (Agarwal et al. (2019); Sutton and Barto (2018)) one central question is to evaluate with confidence the quality of a given policy, measured by its value function. As RL is often modeled as decision making in Markov decision processes (MDPs), the task of statistical inference needs to accommodate the online nature of the sampling mechanism. Temporal Difference (TD) learning (Sutton (1988)) is arguably the most widely used algorithm designed for this purpose. TD learning, which is an instance of stochastic approximation (SA) method (Robbins and Monro (1951)), approximates the value function of a given policy in an iterative manner. Towards understanding the non-asymptotic convergence rate of TD to the target value function, extensive recent efforts have been made (see, e.g.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

2410.16106

Country:

North America > United States > Texas (0.27)
North America > United States > Pennsylvania (0.27)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)