AITopics | He, Yihan

Collaborating Authors

He, Yihan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transformers versus the EM Algorithm in Multi-class Clustering

He, Yihan, Chen, Hong-Yu, Cao, Yuan, Fan, Jianqing, Liu, Han

arXiv.org Machine LearningFeb-9-2025

LLMs demonstrate significant inference capacities in complicated machine learning tasks, using the Transformer model as its backbone. Motivated by the limited understanding of such models on the unsupervised learning problems, we study the learning guarantees of Transformers in performing multi-class clustering of the Gaussian Mixture Models. We develop a theory drawing strong connections between the Softmax Attention layers and the workflow of the EM algorithm on clustering the mixture of Gaussians. Our theory provides approximation bounds for the Expectation and Maximization steps by proving the universal approximation abilities of multivariate mappings by Softmax functions. In addition to the approximation guarantees, we also show that with a sufficient number of pre-training samples and an initialization, Transformers can achieve the minimax optimal rate for the problem considered. Our extensive simulations empirically verified our theory by revealing the strong learning capacities of Transformers even beyond the assumptions in the theory, shedding light on the powerful inference capacities of LLMs.

artificial intelligence, machine learning, transformer, (14 more...)

arXiv.org Machine Learning

2502.06007

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Transformers and Their Roles as Time Series Foundation Models

Wu, Dennis, He, Yihan, Cao, Yuan, Fan, Jianqing, Liu, Han

arXiv.org Artificial IntelligenceFeb-5-2025

We give a comprehensive analysis of transformers as time series foundation models, focusing on their approximation and generalization capabilities. First, we demonstrate that there exist transformers that fit an autoregressive model on input univariate time series via gradient descent. We then analyze MOIRAI, a multivariate time series foundation model capable of handling an arbitrary number of covariates. We prove that it is capable of automatically fitting autoregressive models with an arbitrary number of covariates, offering insights into its design and empirical success. For generalization, we establish bounds for pretraining when the data satisfies Dobrushin's condition. Experiments support our theoretical findings, highlighting the efficacy of transformers as time series foundation models.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.03383

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning Spectral Methods by Transformers

He, Yihan, Cao, Yuan, Chen, Hong-Yu, Wu, Dennis, Fan, Jianqing, Liu, Han

arXiv.org Machine LearningJan-12-2025

Most modern LLMs use Transformers [30] as their backbones, which demonstrate significant advantages over many existing neural network models. Transformers achieve many state-of-the-art performances in learning tasks including natural language processing [33] and computer vision [18]. However, the underlying mechanism for the success of Transformers remains largely a mystery to theoretical researchers. It has been discussed in a line of recent works [2, 4, 15, 38] that, instead of learning simple prediction rules (such as a linear model) Transformers are capable of learning to perform learning algorithms that can automatically generate new prediction rules. For instance, when a new dataset is organized as the input of a Transformer, the model can automatically perform linear regression on this new dataset to produce a newly fitted linear model and make predictions accordingly. This idea of treating Transformers as "algorithm approximators" has provided insights into the power of large language models. However, these existing works only provide guarantees for the in-context supervised learning capacities of Transformers. It remains unclear whether Transformers are capable of handling unsupervised tasks as well.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2501.01312

Country: Asia (0.14)

Genre: Research Report (1.00)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.45)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.45)
Energy > Oil & Gas > Midstream (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Transformers Simulate MLE for Sequence Generation in Bayesian Networks

Cao, Yuan, He, Yihan, Wu, Dennis, Chen, Hong-Yu, Fan, Jianqing, Liu, Han

arXiv.org Machine LearningJan-5-2025

Transformers (Vaswani et al. 2017) have achieved tremendous success across various fields. These models are known to be particularly strong in terms of sequence generation, and have revolutionized the way we approach problems related to text generation, translation, and scientific discoveries such as protein generation. Despite these achievements, there remains limited understanding of the theoretical capabilities of transformers as sequence generators. To theoretically understand how transformers efficiently generate sequences, several recent works have studied the the power of transformers in learning specific probability models for sequential data (Ildiz et al. 2024, Rajaraman et al. 2024, Makkuva et al. 2024, Nichani et al. 2024). Specifically, Ildiz et al. (2024) studied the problem of learning Markov chains with a one-layer self-attention model, and developed identifiability and convergence guarantees under certain conditions. Rajaraman et al. (2024) studied the behavior of transformers on data drawn from k-order Markov processes, where the conditional distribution of the next variable in a sequence depends on the previous k variables, and showed that such processes can be learned well by transformers of a constant-order depth. Makkuva et al. (2024) further studied the loss function landscape of one-layer transformers in learning Markov chains. Nichani et al. (2024) studied a setting where the tokens consist of multiple sequences of samples generated from a causal network, and demonstrated that transformers can be trained to learn the causal network structure so that, when seeing a new context-query pair, it can generate prediction according to the learned causal structure and the context. However, similar to the studies of Markov chains, Nichani et al. (2024) mostly focused on the setting where each variable has at most one parent.

artificial intelligence, machine learning, transformer, (14 more...)

arXiv.org Machine Learning

2501.02547

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

One-Layer Transformer Provably Learns One-Nearest Neighbor In Context

Li, Zihao, Cao, Yuan, Gao, Cheng, He, Yihan, Liu, Han, Klusowski, Jason M., Fan, Jianqing, Wang, Mengdi

arXiv.org Artificial IntelligenceNov-16-2024

Transformers have achieved great success in recent years. Interestingly, transformers have shown particularly strong in-context learning capability -- even without fine-tuning, they are still able to solve unseen tasks well purely based on task-specific prompts. In this paper, we study the capability of one-layer transformers in learning one of the most classical nonparametric estimators, the one-nearest neighbor prediction rule. Under a theoretical framework where the prompt contains a sequence of labeled training data and unlabeled test data, we show that, although the loss function is nonconvex when trained with gradient descent, a single softmax attention layer can successfully learn to behave like a one-nearest neighbor classifier. Our result gives a concrete example of how transformers can be trained to implement nonparametric machine learning algorithms, and sheds light on the role of softmax attention in transformer models.

artificial intelligence, machine learning, transformer, (16 more...)

arXiv.org Artificial Intelligence

2411.1083

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.48)

Add feedback

Global Convergence in Training Large-Scale Transformers

Gao, Cheng, Cao, Yuan, Li, Zihao, He, Yihan, Wang, Mengdi, Liu, Han, Klusowski, Jason Matthew, Fan, Jianqing

arXiv.org Machine LearningOct-30-2024

Despite the widespread success of Transformers across various domains, their optimization guarantees in large-scale model settings are not well-understood. This paper rigorously analyzes the convergence properties of gradient flow in training Transformers with weight decay regularization. First, we construct the mean-field limit of large-scale Transformers, showing that as the model width and depth go to infinity, gradient flow converges to the Wasserstein gradient flow, which is represented by a partial differential equation. Then, we demonstrate that the gradient flow reaches a global minimum consistent with the PDE solution when the weight decay regularization parameter is sufficiently small. Our analysis is based on a series of novel mean-field techniques that adapt to Transformers. Compared with existing tools for deep networks (Lu et al., 2020) that demand homogeneity and global Lipschitz smoothness, we utilize a refined analysis assuming only $\textit{partial homogeneity}$ and $\textit{local Lipschitz smoothness}$. These new techniques may be of independent interest.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2410.2361

Country:

North America > United States (0.45)
Asia (0.27)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles

Xu, Chejian, Ding, Wenhao, Lyu, Weijie, Liu, Zuxin, Wang, Shuai, He, Yihan, Hu, Hanjiang, Zhao, Ding, Li, Bo

arXiv.org Artificial IntelligenceOct-28-2022

As shown by recent studies, machine intelligence-enabled systems are vulnerable to test cases resulting from either adversarial manipulation or natural distribution shifts. This has raised great concerns about deploying machine learning algorithms for real-world applications, especially in safety-critical domains such as autonomous driving (AD). On the other hand, traditional AD testing on naturalistic scenarios requires hundreds of millions of driving miles due to the high dimensionality and rareness of the safety-critical scenarios in the real world. As a result, several approaches for autonomous driving evaluation have been explored, which are usually, however, based on different simulation platforms, types of safety-critical scenarios, scenario generation algorithms, and driving route variations. Thus, despite a large amount of effort in autonomous driving testing, it is still challenging to compare and understand the effectiveness and efficiency of different testing scenario generation algorithms and testing mechanisms under similar conditions. In this paper, we aim to provide the first unified platform SafeBench to integrate different types of safety-critical testing scenarios, scenario generation algorithms, and other variations such as driving routes and environments. Meanwhile, we implement 4 deep reinforcement learning-based AD algorithms with 4 types of input (e.g., bird's-eye view, camera) to perform fair comparisons on SafeBench. We find our generated testing scenarios are indeed more challenging and observe the trade-off between the performance of AD agents under benign and safety-critical testing scenarios. We believe our unified platform SafeBench for large-scale and effective autonomous driving testing will motivate the development of new testing scenario generation and safe AD algorithms. SafeBench is available at https://safebench.github.io.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2206.09682

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.92)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

PAC-Learning Uniform Ergodic Communicative Networks

He, Yihan

arXiv.org Machine LearningNov-20-2021

This work addressed the problem of learning a network with communication between vertices. The communication between vertices is presented in the form of perturbation on the measure. We studied the scenario where samples are drawn from a uniform ergodic Random Graph Process (RGPs for short), which provides a natural mathematical context for the problem of interest. For the binary classification problem, the result we obtained gives uniform learn-ability as the worst-case theoretical limits. We introduced the structural Rademacher complexity, which naturally fused into the VC theory to upperbound the first moment. With the martingale method and Marton's coupling, we establish the tail bound for uniform convergence and give consistency guarantee for empirical risk minimizer. The technique used in this work to obtain high probability bounds is of independent interest to other mixing processes with and without network structure.

artificial intelligence, machine learning, rademacher complexity, (16 more...)

arXiv.org Machine Learning

2111.10708

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.64)

Add feedback

Achievability and Impossibility of Exact Pairwise Ranking

He, Yihan

arXiv.org Machine LearningNov-18-2021

Alike Shah and Wainwright [2017], we also considered comparisons. We assume the SST class the random design observation model, where as the family of generative models. Our the observation of any pairs in each round is observed analysis gave sharp information theoretic upper with probability p. In particular, we focus on the problem and lower bound for the exact requirement, of exact recovery, where the estimator converges which matches exactly in the parametric almost surely to the ground truth as the scale of the limit. Our tight analysis on the algorithm problem goes to infinity. Under this requirement, we induced by the moment method gave better formally discuss the fundamental limits in this problem.

artificial intelligence, estimator, impossibility, (15 more...)

arXiv.org Machine Learning

2111.10021

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback