AITopics | Jiao, Yuling

Collaborating Authors

Jiao, Yuling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Approximation Bounds for Transformer Networks with Application to Regression

Jiao, Yuling, Lai, Yanming, Sun, Defeng, Wang, Yang, Yan, Bokai

arXiv.org Machine LearningApr-16-2025

We explore the approximation capabilities of Transformer networks for H\"older and Sobolev functions, and apply these results to address nonparametric regression estimation with dependent observations. First, we establish novel upper bounds for standard Transformer networks approximating sequence-to-sequence mappings whose component functions are H\"older continuous with smoothness index $\gamma \in (0,1]$. To achieve an approximation error $\varepsilon$ under the $L^p$-norm for $p \in [1, \infty]$, it suffices to use a fixed-depth Transformer network whose total number of parameters scales as $\varepsilon^{-d_x n / \gamma}$. This result not only extends existing findings to include the case $p = \infty$, but also matches the best known upper bounds on number of parameters previously obtained for fixed-depth FNNs and RNNs. Similar bounds are also derived for Sobolev functions. Second, we derive explicit convergence rates for the nonparametric regression problem under various $\beta$-mixing data assumptions, which allow the dependence between observations to weaken over time. Our bounds on the sample complexity impose no constraints on weight magnitudes. Lastly, we propose a novel proof strategy to establish approximation bounds, inspired by the Kolmogorov-Arnold representation theorem. We show that if the self-attention layer in a Transformer can perform column averaging, the network can approximate sequence-to-sequence H\"older functions, offering new insights into the interpretability of self-attention mechanisms.

large language model, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2504.12175

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Distribution Matching for Self-Supervised Transfer Learning

Jiao, Yuling, Ma, Wensen, Sun, Defeng, Wang, Hansheng, Wang, Yang

arXiv.org Machine LearningFeb-20-2025

In this paper, we propose a novel self-supervised transfer learning method called Distribution Matching (DM), which drives the representation distribution toward a predefined reference distribution while preserving augmentation invariance. The design of DM results in a learned representation space that is intuitively structured and offers easily interpretable hyperparameters. Experimental results across multiple real-world datasets and evaluation metrics demonstrate that DM performs competitively on target classification tasks compared to existing self-supervised transfer learning methods. Additionally, we provide robust theoretical guarantees for DM, including a population theorem and an end-to-end sample theorem. The population theorem bridges the gap between the self-supervised learning task and target classification accuracy, while the sample theorem shows that, even with a limited number of samples from the target domain, DM can deliver exceptional classification performance, provided the unlabeled sample size is sufficiently large.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2502.14424

Country:

Europe (0.92)
Asia > China (0.46)
North America > United States (0.27)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Deep Transfer Learning: Model Framework and Error Analysis

Jiao, Yuling, Lin, Huazhen, Luo, Yuchen, Yang, Jerry Zhijian

arXiv.org Machine LearningJan-4-2025

This paper presents a framework for deep transfer learning, which aims to leverage information from multi-domain upstream data with a large number of samples $n$ to a single-domain downstream task with a considerably smaller number of samples $m$, where $m \ll n$, in order to enhance performance on downstream task. Our framework offers several intriguing features. First, it allows the existence of both shared and domain-specific features across multi-domain data and provides a framework for automatic identification, achieving precise transfer and utilization of information. Second, the framework explicitly identifies upstream features that contribute to downstream tasks, establishing clear relationships between upstream domains and downstream tasks, thereby enhancing interpretability. Error analysis shows that our framework can significantly improve the convergence rate for learning Lipschitz functions in downstream supervised tasks, reducing it from $\tilde{O}(m^{-\frac{1}{2(d+2)}}+n^{-\frac{1}{2(d+2)}})$ ("no transfer") to $\tilde{O}(m^{-\frac{1}{2(d^*+3)}} + n^{-\frac{1}{2(d+2)}})$ ("partial transfer"), and even to $\tilde{O}(m^{-1/2}+n^{-\frac{1}{2(d+2)}})$ ("complete transfer"), where $d^* \ll d$ and $d$ is the dimension of the observed data. Our theoretical findings are supported by empirical experiments on image classification and regression datasets.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

2410.09383

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.62)

Add feedback

Nonlinear Assimilation with Score-based Sequential Langevin Sampling

Ding, Zhao, Duan, Chenguang, Jiao, Yuling, Yang, Jerry Zhijian, Yuan, Cheng, Zhang, Pingwen

arXiv.org Machine LearningNov-20-2024

This paper presents a novel approach for nonlinear assimilation called score-based sequential Langevin sampling (SSLS) within a recursive Bayesian framework. SSLS decomposes the assimilation process into a sequence of prediction and update steps, utilizing dynamic models for prediction and observation data for updating via score-based Langevin Monte Carlo. An annealing strategy is incorporated to enhance convergence and facilitate multi-modal sampling. The convergence of SSLS in TV-distance is analyzed under certain conditions, providing insights into error behavior related to hyper-parameters. Numerical examples demonstrate its outstanding performance in high-dimensional and nonlinear scenarios, as well as in situations with sparse or partial measurements. Furthermore, SSLS effectively quantifies the uncertainty associated with the estimated states, highlighting its potential for error calibration.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2411.13443

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Characteristic Learning for Provable One Step Generation

Ding, Zhao, Duan, Chenguang, Jiao, Yuling, Li, Ruoxuan, Yang, Jerry Zhijian, Zhang, Pingwen

arXiv.org Artificial IntelligenceJul-16-2024

We propose the characteristic generator, a novel one-step generative model that combines the efficiency of sampling in Generative Adversarial Networks (GANs) with the stable performance of flow-based models. Our model is driven by characteristics, along which the probability density transport can be described by ordinary differential equations (ODEs). Specifically, We estimate the velocity field through nonparametric regression and utilize Euler method to solve the probability flow ODE, generating a series of discrete approximations to the characteristics. We then use a deep neural network to fit these characteristics, ensuring a one-step mapping that effectively pushes the prior distribution towards the target distribution. In the theoretical aspect, we analyze the errors in velocity matching, Euler discretization, and characteristic fitting to establish a non-asymptotic convergence rate for the characteristic generator in 2-Wasserstein distance. To the best of our knowledge, this is the first thorough analysis for simulation-free one step generative models. Additionally, our analysis refines the error analysis of flow-based generative models in prior works. We apply our method on both synthetic and real datasets, and the results demonstrate that the characteristic generator achieves high generation quality with just a single evaluation of neural network.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.05512

Country:

Europe (0.92)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DRM Revisited: A Complete Error Analysis

Jiao, Yuling, Li, Ruoxuan, Wu, Peiying, Yang, Jerry Zhijian, Zhang, Pingwen

arXiv.org Artificial IntelligenceJul-12-2024

The success of deep learning methods in high-dimensional data analysis has led to the development of promising approaches for solving high-dimensional PDEs using deep neural networks, which have attracted much attention [AAAR19, SS18, LMMK21, RPK19, WY17, ZBYZ20, BDG20, HJW18]. Due to the excellent approximation power of deep neural networks, several numerical schemes have been proposed for solving PDEs, including physicsinformed neural networks (PINNs) [RPK19], weak adversarial networks (WAN) [ZBYZ20] and the deep Ritz method (DRM) [WY17]. PINNs is based on residual minimization, while WAN is inspired by Galerkin method.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2407.09032

Country:

Asia > China (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Quantum Compiling with Reinforcement Learning on a Superconducting Processor

Wang, Z. T., Chen, Qiuhao, Du, Yuxuan, Yang, Z. H., Cai, Xiaoxia, Huang, Kaixuan, Zhang, Jingning, Xu, Kai, Du, Jun, Li, Yinan, Jiao, Yuling, Wu, Xingyao, Liu, Wu, Lu, Xiliang, Xu, Huikai, Jin, Yirong, Wang, Ruixia, Yu, Haifeng, Zhao, S. P.

arXiv.org Artificial IntelligenceJun-17-2024

To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcement learning (RL)-based quantum compiler for a superconducting processor and demonstrate its capability of discovering novel and hardware-amenable circuits with short lengths. We show that for the three-qubit quantum Fourier transformation, a compiled circuit using only seven CZ gates with unity circuit fidelity can be achieved. The compiler is also able to find optimal circuits under device topological constraints, with lengths considerably shorter than those by the conventional method. Our study exemplifies the codesign of the software with hardware for efficient quantum compilation, offering valuable insights for the advancement of RL-based compilers.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2406.12195

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)

Add feedback

Model Free Prediction with Uncertainty Assessment

Jiao, Yuling, Kang, Lican, Liu, Jin, Peng, Heng, Zuo, Heng

arXiv.org Machine LearningJun-16-2024

Deep nonparametric regression, characterized by the utilization of deep neural networks to learn target functions, has emerged as a focus of research attention in recent years. Despite considerable progress in understanding convergence rates, the absence of asymptotic properties hinders rigorous statistical inference. To address this gap, we propose a novel framework that transforms the deep estimation paradigm into a platform conducive to conditional mean estimation, leveraging the conditional diffusion model. Theoretically, we develop an end-to-end convergence rate for the conditional diffusion model and establish the asymptotic normality of the generated samples. Consequently, we are equipped to construct confidence regions, facilitating robust statistical inference. Furthermore, through numerical experiments, we empirically validate the efficacy of our proposed methodology.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Machine Learning

2405.12684

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Industry: Materials > Chemicals (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method

Jiao, Yuling, Lai, Yanming, Wang, Yang

arXiv.org Machine LearningMay-19-2024

Machine learning is a rapidly advancing field with diverse applications across various domains. One prominent area of research is the utilization of deep learning techniques for solving partial differential equations(PDEs). In this work, we specifically focus on employing a three-layer tanh neural network within the framework of the deep Ritz method(DRM) to solve second-order elliptic equations with three different types of boundary conditions. We perform projected gradient descent(PDG) to train the three-layer network and we establish its global convergence. To the best of our knowledge, we are the first to provide a comprehensive error analysis of using overparameterized networks to solve PDE problems, as our analysis simultaneously includes estimates for approximation error, generalization error, and optimization error. We present error bound in terms of the sample size $n$ and our work provides guidance on how to set the network depth, width, step size, and number of iterations for the projected gradient descent algorithm. Importantly, our assumptions in this work are classical and we do not require any additional assumptions on the solution of the equation. This ensures the broad applicability and generality of our results.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

2405.11451

Country: Asia > China > Hong Kong (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Latent Schr{\"o}dinger Bridge Diffusion Model for Generative Learning

Jiao, Yuling, Kang, Lican, Lin, Huazhen, Liu, Jin, Zuo, Heng

arXiv.org Machine LearningApr-20-2024

This paper aims to conduct a comprehensive theoretical analysis of current diffusion models. We introduce a novel generative learning methodology utilizing the Schr{\"o}dinger bridge diffusion model in latent space as the framework for theoretical exploration in this domain. Our approach commences with the pre-training of an encoder-decoder architecture using data originating from a distribution that may diverge from the target distribution, thus facilitating the accommodation of a large sample size through the utilization of pre-existing large-scale models. Subsequently, we develop a diffusion model within the latent space utilizing the Schr{\"o}dinger bridge framework. Our theoretical analysis encompasses the establishment of end-to-end error analysis for learning distributions via the latent Schr{\"o}dinger bridge diffusion model. Specifically, we control the second-order Wasserstein distance between the generated distribution and the target distribution. Furthermore, our obtained convergence rates effectively mitigate the curse of dimensionality, offering robust theoretical support for prevailing diffusion models.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Machine Learning

2404.13309

Country: Asia > China (0.68)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback