AITopics | Lin, Ye

Collaborating Authors

Lin, Ye

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Orthogonal greedy algorithm for linear operator learning with shallow neural network

Lin, Ye, Jia, Jiwei, Lee, Young Ju, Zhang, Ran

arXiv.org Artificial IntelligenceJan-6-2025

Greedy algorithms, particularly the orthogonal greedy algorithm (OGA), have proven e ff ective in training shallow neural networks for fitting functions and solving partial di fferential equations (PDEs). In this paper, we extend the application of OGA to the tasks of linear operator learning, which is equivalent to learning the kernel function through integral transforms. Firstly, a novel greedy algorithm is developed for kernel estimation rate in a new semi-inner product, which can be utilized to approximate the Green's function of linear PDEs from data. Secondly, we introduce the OGA for point-wise kernel estimation to further improve the approximation rate, achieving orders of accuracy improvement across various tasks and baseline models. In addition, we provide a theoretical analysis on the kernel estimation problem and the optimal approximation rates for both algorithms, establishing their e fficacy and potential for future applications in PDEs and operator learning tasks. Introduction In recent years, deep neural networks have emerged as a powerful tool for solving partial di ff erential equations (PDEs) in a wide range of scientific and engineering domains [1]. Approaches in this area can be broadly classified into two main categories: (1) single PDE solvers and (2) operator learning. Single PDE solvers, such as physics-informed neural networks(PINNs)[2], the deep Galerkin method[3], the deep Ritz method[4], optimize the deep neural network by minimizing a given loss function related to the PDE. These methods are specifically designed to solve a given instance of the PDE. In contrast, operator learning involves using deep neural networks to learn operators between function spaces, allowing for the learning of solution operators of PDEs from data pairs. Recently, several operator learning methods have been proposed, including deep Green networks (DGN)[5], deep operator networks (DON)[6], and neural operators (NOs)[7].

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2501.02791

Country:

North America > United States (0.46)
Asia (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Green Multigrid Network

Lin, Ye, Lee, Young Ju, Jia, Jiwei

arXiv.org Artificial IntelligenceJul-3-2024

GreenLearning networks (GL) directly learn Green's function in physical space, making them an interpretable model for capturing unknown solution operators of partial differential equations (PDEs). For many PDEs, the corresponding Green's function exhibits asymptotic smoothness. In this paper, we propose a framework named Green Multigrid networks (GreenMGNet), an operator learning algorithm designed for a class of asymptotically smooth Green's functions. Compared with the pioneering GL, the new framework presents itself with better accuracy and efficiency, thereby achieving a significant improvement. GreenMGNet is composed of two technical novelties. First, Green's function is modeled as a piecewise function to take into account its singular behavior in some parts of the hyperplane. Such piecewise function is then approximated by a neural network with augmented output(AugNN) so that it can capture singularity accurately. Second, the asymptotic smoothness property of Green's function is used to leverage the Multi-Level Multi-Integration (MLMI) algorithm for both the training and inference stages. Several test cases of operator learning are presented to demonstrate the accuracy and effectiveness of the proposed method. On average, GreenMGNet achieves $3.8\%$ to $39.15\%$ accuracy improvement. To match the accuracy level of GL, GreenMGNet requires only about $10\%$ of the full grid data, resulting in a $55.9\%$ and $92.5\%$ reduction in training time and GPU memory cost for one-dimensional test problems, and a $37.7\%$ and $62.5\%$ reduction for two-dimensional test problems.

artificial intelligence, machine learning, operator, (18 more...)

arXiv.org Artificial Intelligence

2407.03593

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding Parameter Sharing in Transformers

Lin, Ye, Wang, Mingxuan, Zhang, Zhexi, Wang, Xiaohui, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceJun-15-2023

Parameter sharing has proven to be a parameter-efficient approach. Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model depth. In this paper, we study why this approach works from two perspectives. First, increasing model depth makes the model more complex, and we hypothesize that the reason is related to model complexity (referring to FLOPs). Secondly, since each shared parameter will participate in the network computation several times in forward propagation, its corresponding gradient will have a different range of values from the original model, which will affect the model convergence. Based on this, we hypothesize that training convergence may also be one of the reasons. Through further analysis, we show that the success of this approach can be largely attributed to better convergence, with only a small part due to the increased model complexity. Inspired by this, we tune the training hyperparameters related to model convergence in a targeted manner. Experiments on 8 machine translation tasks show that our model achieves competitive performance with only half the model complexity of parameter sharing models.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2306.0938

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

Add feedback

MobileNMT: Enabling Translation in 15MB and 30ms

Lin, Ye, Wang, Xiaohui, Zhang, Zhexi, Wang, Mingxuan, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceJun-7-2023

Deploying NMT models on mobile devices is essential for privacy, low latency, and offline scenarios. For high model capacity, NMT models are rather large. Running these models on devices is challenging with limited storage, memory, computation, and power consumption. Existing work either only focuses on a single metric such as FLOPs or general engine which is not good at auto-regressive decoding. In this paper, we present MobileNMT, a system that can translate in 15MB and 30ms on devices. We propose a series of principles for model compression when combined with quantization. Further, we implement an engine that is friendly to INT8 and decoding. With the co-design of model and engine, compared with the existing system, we speed up 47.0x and save 99.5% of memory with only 11.6% loss of BLEU. The code is publicly available at https://github.com/zjersey/Lightseq-ARM.

machine learning, natural language, quantization, (15 more...)

arXiv.org Artificial Intelligence

2306.04235

Country:

Europe (1.00)
North America > United States > Louisiana (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

An Efficient Transformer Decoder with Compressed Sub-layers

Li, Yanyang, Lin, Ye, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceMay-11-2023

The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness. But the high computation complexity of its decoder raises the inefficiency issue. By examining the mathematic formulation of the decoder, we show that under some mild conditions, the architecture could be simplified by compressing its sub-layers, the basic building block of Transformer, and achieves a higher parallelism. We thereby propose Compressed Attention Network, whose decoder layer consists of only one sub-layer instead of three. Extensive experiments on 14 WMT machine translation tasks show that our model is 1.42x faster with performance on par with a strong baseline. This strong baseline is already 2x faster than the widely used standard baseline without loss in performance.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2101.00542

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.92)

Add feedback

Multi-Path Transformer is Better: A Case Study on Neural Machine Translation

Lin, Ye, Zhou, Shuhan, Li, Yanyang, Ma, Anxiang, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceMay-10-2023

For years the model performance in machine learning obeyed a power-law relationship with the model size. For the consideration of parameter efficiency, recent studies focus on increasing model depth rather than width to achieve better performance. In this paper, we study how model width affects the Transformer model through a parameter-efficient multi-path structure. To better fuse features extracted from different paths, we add three additional operations to each sublayer: a normalization at the end of each path, a cheap operation to produce more features, and a learnable weighted mechanism to fuse all features flexibly. Extensive experiments on 12 WMT machine translation tasks show that, with the same number of parameters, the shallower multi-path model can achieve similar or even better performance than the deeper model. It reveals that we should pay more attention to the multi-path structure, and there should be a balance between the model depth and width to train a better large-scale Transformer.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.05948

Country:

North America > United States (1.00)
Asia (0.94)
Europe (0.93)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

General-Purpose User Embeddings based on Mobile App Usage

Zhang, Junqi, Bai, Bing, Lin, Ye, Liang, Jian, Bai, Kun, Wang, Fei

arXiv.org Machine LearningMay-27-2020

In this paper, we report our recent practice at Tencent for user modeling based on mobile app usage. User behaviors on mobile app usage, including retention, installation, and uninstallation, can be a good indicator for both long-term and short-term interests of users. For example, if a user installs Snapseed recently, she might have a growing interest in photographing. Such information is valuable for numerous downstream applications, including advertising, recommendations, etc. Traditionally, user modeling from mobile app usage heavily relies on handcrafted feature engineering, which requires onerous human work for different downstream applications, and could be sub-optimal without domain experts. However, automatic user modeling based on mobile app usage faces unique challenges, including (1) retention, installation, and uninstallation are heterogeneous but need to be modeled collectively, (2) user behaviors are distributed unevenly over time, and (3) many long-tailed apps suffer from serious sparsity. In this paper, we present a tailored AutoEncoder-coupled Transformer Network (AETN), by which we overcome these challenges and achieve the goals of reducing manual efforts and boosting performance. We have deployed the model at Tencent, and both online/offline experiments from multiple domains of downstream applications have demonstrated the effectiveness of the output user embeddings.

deep learning, neural network, retention, (20 more...)

arXiv.org Machine Learning

2005.13303

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology (1.00)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback