AITopics | Ren, Yinuo

Plotting

Ren, Yinuo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Unified Approach to Analysis and Design of Denoising Markov Models

Ren, Yinuo, Rotskoff, Grant M., Ying, Lexing

arXiv.org Machine LearningApr-2-2025

Probabilistic generative models based on measure transport, such as diffusion and flow-based models, are often formulated in the language of Markovian stochastic dynamics, where the choice of the underlying process impacts both algorithmic design choices and theoretical analysis. In this paper, we aim to establish a rigorous mathematical foundation for denoising Markov models, a broad class of generative models that postulate a forward process transitioning from the target distribution to a simple, easy-to-sample distribution, alongside a backward process particularly constructed to enable efficient sampling in the reverse direction. Leveraging deep connections with nonequilibrium statistical mechanics and generalized Doob's $h$-transform, we propose a minimal set of assumptions that ensure: (1) explicit construction of the backward generator, (2) a unified variational objective directly minimizing the measure transport discrepancy, and (3) adaptations of the classical score-matching approach across diverse dynamics. Our framework unifies existing formulations of continuous and discrete diffusion models, identifies the most general form of denoising Markov models under certain regularity assumptions on forward generators, and provides a systematic recipe for designing denoising Markov models driven by arbitrary L\'evy-type processes. We illustrate the versatility and practical effectiveness of our approach through novel denoising Markov models employing geometric Brownian motion and jump processes as forward dynamics, highlighting the framework's potential flexibility and capability in modeling complex distributions.

arxiv preprint arxiv, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

2504.01938

Country: Europe (0.45)

Genre: Research Report (0.49)

Industry: Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms

Ren, Yinuo, Chen, Haoxuan, Zhu, Yuchen, Guo, Wei, Chen, Yongxin, Rotskoff, Grant M., Tao, Molei, Ying, Lexing

arXiv.org Machine LearningJan-31-2025

Discrete diffusion models have emerged as a powerful generative modeling framework for discrete data with successful applications spanning from text generation to image synthesis. However, their deployment faces challenges due to the high dimensionality of the state space, necessitating the development of efficient inference algorithms. Current inference approaches mainly fall into two categories: exact simulation and approximate methods such as $\tau$-leaping. While exact methods suffer from unpredictable inference time and redundant function evaluations, $\tau$-leaping is limited by its first-order accuracy. In this work, we advance the latter category by tailoring the first extension of high-order numerical inference schemes to discrete diffusion models, enabling larger step sizes while reducing error. We rigorously analyze the proposed schemes and establish the second-order accuracy of the $\theta$-trapezoidal method in KL divergence. Empirical evaluations on GPT-2 level text and ImageNet-level image generation tasks demonstrate that our method achieves superior sample quality compared to existing approaches under equivalent computational constraints.

diffusion model, large language model, machine learning, (14 more...)

arXiv.org Machine Learning

2502.00234

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework

Ren, Yinuo, Xiao, Tesi, Shavlovsky, Michael, Ying, Lexing, Rahmanian, Holakou

arXiv.org Artificial IntelligenceDec-3-2024

In LLM alignment and many other ML applications, one often faces the Multi-Objective Fine-Tuning (MOFT) problem, i.e. fine-tuning an existing model with datasets labeled w.r.t. different objectives simultaneously. To address the challenge, we propose the HyperDPO framework, a conditioned one-shot fine-tuning approach that extends the Direct Preference Optimization (DPO) technique, originally developed for efficient LLM alignment with preference data, to accommodate the MOFT settings. By substituting the Bradley-Terry-Luce model in DPO with the Plackett-Luce model, our framework is capable of handling a wide range of MOFT tasks that involve listwise ranking datasets. Compared with previous approaches, HyperDPO enjoys an efficient one-shot training process for profiling the Pareto front of auxiliary objectives, and offers post-training control over trade-offs. Additionally, we propose a novel Hyper Prompt Tuning design, that conveys continuous importance weight across objectives to transformer-based models without altering their architecture, and investigate the potential of temperature-conditioned networks for enhancing the flexibility of post-training control. We demonstrate the effectiveness and efficiency of the HyperDPO framework through its applications to various tasks, including Learning-to-Rank (LTR) and LLM alignment, highlighting its viability for large-scale ML deployments.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.08316

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework

Ren, Yinuo, Chen, Haoxuan, Rotskoff, Grant M., Ying, Lexing

arXiv.org Machine LearningOct-4-2024

Discrete diffusion models have gained increasing attention for their ability to model complex distributions with tractable sampling and inference. However, the error analysis for discrete diffusion models remains less well-understood. In this work, we propose a comprehensive framework for the error analysis of discrete diffusion models based on L\'evy-type stochastic integrals. By generalizing the Poisson random measure to that with a time-independent and state-dependent intensity, we rigorously establish a stochastic integral formulation of discrete diffusion models and provide the corresponding change of measure theorems that are intriguingly analogous to It\^o integrals and Girsanov's theorem for their continuous counterparts. Our framework unifies and strengthens the current theoretical results on discrete diffusion models and obtains the first error bound for the $\tau$-leaping scheme in KL divergence. With error sources clearly identified, our analysis gives new insight into the mathematical properties of discrete diffusion models and offers guidance for the design of efficient and accurate algorithms for real-world discrete diffusion model applications.

artificial intelligence, diffusion model, machine learning, (14 more...)

arXiv.org Machine Learning

2410.03601

Country: North America > United States (0.28)

Genre: Overview (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)

Add feedback

Optimal starting point for time series forecasting

Zhong, Yiming, Ren, Yinuo, Cao, Guangyao, Li, Feng, Qi, Haobo

arXiv.org Artificial IntelligenceSep-25-2024

Recent advances on time series forecasting mainly focus on improving the forecasting models themselves. However, managing the length of the input data can also significantly enhance prediction performance. In this paper, we introduce a novel approach called Optimal Starting Point Time Series Forecast (OSP-TSP) to capture the intrinsic characteristics of time series data. By adjusting the sequence length via leveraging the XGBoost and LightGBM models, the proposed approach can determine optimal starting point (OSP) of the time series and thus enhance the prediction performances. The performances of the OSP-TSP approach are then evaluated across various frequencies on the M4 dataset and other real-world datasets. Empirical results indicate that predictions based on the OSP-TSP approach consistently outperform those using the complete dataset. Moreover, recognizing the necessity of sufficient data to effectively train models for OSP identification, we further propose targeted solutions to address the issue of data insufficiency.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2409.16843

Country:

Asia > China (0.15)
Asia > Japan (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity

Chen, Haoxuan, Ren, Yinuo, Ying, Lexing, Rotskoff, Grant M.

arXiv.org Machine LearningMay-24-2024

Diffusion models have become a leading method for generativ e modeling of both image and scientific data. As these models are costly to train and evaluate, reducing the inference cost for diffusion models remains a maj or goal. Inspired by the recent empirical success in accelerating diffusion mod els via the parallel sampling technique [1], we propose to divide the sampling proce ss into O (1) blocks with parallelizable Picard iterations within each block. R igorous theoretical analysis reveals that our algorithm achieves null O (poly log d) overall time complexity, marking the first implementation with provable sub-linear complexi ty w.r .t. the data dimension d. Our analysis is based on a generalized version of Girsanov' s theorem and is compatible with both the SDE and probability fl ow ODE implementations. Our results shed light on the potential of fast a nd efficient sampling of high-dimensional data on fast-evolving modern large-me mory GPU clusters.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

arXiv.org Machine Learning

2405.15986

Country: North America (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Understanding the Generalization Benefits of Late Learning Rate Decay

Ren, Yinuo, Ma, Chao, Ying, Lexing

arXiv.org Artificial IntelligenceJan-21-2024

Why do neural networks trained with large learning rates for a longer time often lead to better generalization? In this paper, we delve into this question by examining the relation between training and testing loss in neural networks. Through visualization of these losses, we note that the training trajectory with a large learning rate navigates through the minima manifold of the training loss, finally nearing the neighborhood of the testing loss minimum. Motivated by these findings, we introduce a nonlinear model whose loss landscapes mirror those observed for real neural networks. Upon investigating the training process using SGD on our model, we demonstrate that an extended phase with a large learning rate steers our model towards the minimum norm solution of the training loss, which may achieve near-optimal generalization, thereby affirming the empirically observed benefits of late learning rate decay.

artificial intelligence, machine learning, neural network, (13 more...)

arXiv.org Artificial Intelligence

2401.116

Country:

Europe (0.46)
North America > United States > Wisconsin (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Statistical Spatially Inhomogeneous Diffusion Inference

Ren, Yinuo, Lu, Yiping, Ying, Lexing, Rotskoff, Grant M.

arXiv.org Machine LearningDec-10-2023

Inferring a diffusion equation from discretely-observed measurements is a statistical challenge of significant importance in a variety of fields, from single-molecule tracking in biophysical systems to modeling financial instruments. Assuming that the underlying dynamical process obeys a $d$-dimensional stochastic differential equation of the form $$\mathrm{d}\boldsymbol{x}_t=\boldsymbol{b}(\boldsymbol{x}_t)\mathrm{d} t+\Sigma(\boldsymbol{x}_t)\mathrm{d}\boldsymbol{w}_t,$$ we propose neural network-based estimators of both the drift $\boldsymbol{b}$ and the spatially-inhomogeneous diffusion tensor $D = \Sigma\Sigma^{T}$ and provide statistical convergence guarantees when $\boldsymbol{b}$ and $D$ are $s$-H\"older continuous. Notably, our bound aligns with the minimax optimal rate $N^{-\frac{2s}{2s+d}}$ for nonparametric function estimation even in the presence of correlation within observational data, which necessitates careful handling when establishing fast-rate generalization bounds. Our theoretical results are bolstered by numerical experiments demonstrating accurate inference of spatially-inhomogeneous diffusion tensors.

artificial intelligence, estimation, machine learning, (19 more...)

arXiv.org Machine Learning

2312.05793

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Infinite forecast combinations based on Dirichlet process

Ren, Yinuo, Li, Feng, Kang, Yanfei, Wang, Jue

arXiv.org Machine LearningNov-24-2023

Forecast combination integrates information from various sources by consolidating multiple forecast results from the target time series. Instead of the need to select a single optimal forecasting model, this paper introduces a deep learning ensemble forecasting model based on the Dirichlet process. Initially, the learning rate is sampled with three basis distributions as hyperparameters to convert the infinite mixture into a finite one. All checkpoints are collected to establish a deep learning sub-model pool, and weight adjustment and diversity strategies are developed during the combination process. The main advantage of this method is its ability to generate the required base learners through a single training process, utilizing the decaying strategy to tackle the challenge posed by the stochastic nature of gradient descent in determining the optimal learning rate. To ensure the method's generalizability and competitiveness, this paper conducts an empirical analysis using the weekly dataset from the M4 competition and explores sensitivity to the number of models to be combined. The results demonstrate that the ensemble model proposed offers substantial improvements in prediction accuracy and stability compared to a single benchmark model.

artificial intelligence, dirichlet process, machine learning, (17 more...)

arXiv.org Machine Learning

2311.12379

Country: Asia > China (0.15)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.47)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Objective Optimization via Wasserstein-Fisher-Rao Gradient Flow

Ren, Yinuo, Xiao, Tesi, Gangwani, Tanmay, Rangi, Anshuka, Rahmanian, Holakou, Ying, Lexing, Sanyal, Subhajit

arXiv.org Machine LearningNov-21-2023

Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications. We introduce a novel interacting particle method for MOO inspired by molecular dynamics simulations. Our approach combines overdamped Langevin and birth-death dynamics, incorporating a "dominance potential" to steer particles toward global Pareto optimality. In contrast to previous methods, our method is able to relocate dominated particles, making it particularly adept at managing Pareto fronts of complicated geometries. Our method is also theoretically grounded as a Wasserstein-Fisher-Rao gradient flow with convergence guarantees. Extensive experiments confirm that our approach outperforms state-of-the-art methods on challenging synthetic and real-world datasets.

artificial intelligence, machine learning, pareto front, (15 more...)

arXiv.org Machine Learning

2311.13159

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback