AITopics | Li, Xingjian

Collaborating Authors

Li, Xingjian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization

Kan, Kelvin, Li, Xingjian, Osher, Stanley

arXiv.org Artificial IntelligenceJan-30-2025

Transformers have achieved state-of-the-art performance in numerous tasks. In this paper, we propose a continuous-time formulation of transformers. Specifically, we consider a dynamical system whose governing equation is parametrized by transformer blocks. We leverage optimal transport theory to regularize the training problem, which enhances stability in training and improves generalization of the resulting model. Moreover, we demonstrate in theory that this regularization is necessary as it promotes uniqueness and regularity of solutions. Our model is flexible in that almost any existing transformer architectures can be adopted to construct the dynamical system with only slight modifications to the existing code. We perform extensive numerical experiments on tasks motivated by natural language processing, image classification, and point cloud classification. Our experimental results show that the proposed method improves the performance of its discrete counterpart and outperforms relevant comparing models.

machine learning, natural language, transformer block, (16 more...)

arXiv.org Artificial Intelligence

2501.18793

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Texas (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Trustworthy Federated Learning: Privacy, Security, and Beyond

Chen, Chunlu, Liu, Ji, Tan, Haowen, Li, Xingjian, Wang, Kevin I-Kai, Li, Peng, Sakurai, Kouichi, Dou, Dejing

arXiv.org Artificial IntelligenceNov-3-2024

While recent years have witnessed the advancement in big data and Artificial Intelligence (AI), it is of much importance to safeguard data privacy and security. As an innovative approach, Federated Learning (FL) addresses these concerns by facilitating collaborative model training across distributed data sources without transferring raw data. However, the challenges of robust security and privacy across decentralized networks catch significant attention in dealing with the distributed data in FL. In this paper, we conduct an extensive survey of the security and privacy issues prevalent in FL, underscoring the vulnerability of communication links and the potential for cyber threats. We delve into various defensive strategies to mitigate these risks, explore the applications of FL across different sectors, and propose research directions. We identify the intricate security challenges that arise within the FL frameworks, aiming to contribute to the development of secure and efficient FL systems.

data mining, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.01583

Country:

North America > United States (1.00)
Asia > China (0.68)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(2 more...)

Add feedback

Photorealistic Robotic Simulation using Unreal Engine 5 for Agricultural Applications

Li, Xingjian, Xiang, Lirong

arXiv.org Artificial IntelligenceMay-28-2024

This work presents a new robotics simulation environment built upon Unreal Engine 5 (UE5) for agricultural image data generation. The simulation utilizes the state-of-the-art real-time rendering engine to provide realistic plant images which are often used in agricultural applications. This study showcases the rendering accuracy of UE5 in comparison to existing tools and assesses its positional accuracy when integrated with Robot Operating Systems (ROS). The results indicate that UE5 achieves an impressive average distance error of 0.021mm when compared to predetermined setpoints in a multi-robot setup involving two UR10 arms.

artificial intelligence, simulation, unreal engine 5, (11 more...)

arXiv.org Artificial Intelligence

2405.18551

Country: North America > United States (0.30)

Genre: Research Report (0.41)

Industry: Leisure & Entertainment > Games > Computer Games (0.35)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Deep Active Learning with Noise Stability

Li, Xingjian, Yang, Pengkun, Gu, Yangcheng, Zhan, Xueying, Wang, Tianyang, Xu, Min, Xu, Chengzhong

arXiv.org Artificial IntelligenceFeb-13-2024

Uncertainty estimation for unlabeled data is crucial to active learning. With a deep neural network employed as the backbone model, the data selection process is highly challenging due to the potential over-confidence of the model inference. Existing methods resort to special learning fashions (e.g. adversarial) or auxiliary models to address this challenge. This tends to result in complex and inefficient pipelines, which would render the methods impractical. In this work, we propose a novel algorithm that leverages noise stability to estimate data uncertainty. The key idea is to measure the output derivation from the original observation when the model parameters are randomly perturbed by noise. We provide theoretical analyses by leveraging the small Gaussian noise theory and demonstrate that our method favors a subset with large and diverse gradients. Our method is generally applicable in various tasks, including computer vision, natural language processing, and structural data analysis. It achieves competitive performance compared against state-of-the-art active learning baselines.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2205.1334

Country: North America > United States > Iowa (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization

Hua, Hang, Li, Xingjian, Dou, Dejing, Xu, Cheng-Zhong, Luo, Jiebo

arXiv.org Artificial IntelligenceNov-8-2023

The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing. Many state-of-the-art language models are first trained on a large text corpus and then fine-tuned on downstream tasks. Despite its recent success and wide adoption, fine-tuning a pre-trained language model often suffers from overfitting, which leads to poor generalizability due to the extremely high complexity of the model and the limited training samples from downstream tasks. To address this problem, we propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR). Specifically, we propose to inject the standard Gaussian noise or In-manifold noise and regularize hidden representations of the fine-tuned model. We first provide theoretical analyses to support the efficacy of our method. We then demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART. While these previous works only verify the effectiveness of their methods on relatively simple text classification tasks, we also verify the effectiveness of our method on question answering tasks, where the target problem is much more difficult and more training examples are available. Furthermore, extensive experimental results indicate that the proposed algorithm can not only enhance the in-domain performance of the language models but also improve the domain generalization performance on out-of-domain data.

artificial intelligence, natural language, ps 2206, (9 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TNNLS.2023.3330926

2206.05658

Genre: Research Report (0.76)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.53)

Add feedback

Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources

Liu, Ji, Dong, Daxiang, Wang, Xi, Qin, An, Li, Xingjian, Valduriez, Patrick, Dou, Dejing, Yu, Dianhai

arXiv.org Artificial IntelligenceJul-14-2022

Although more layers and more parameters generally improve the accuracy of the models, such big models generally have high computational complexity and require big memory, which exceed the capacity of small devices for inference and incurs long training time. In addition, it is difficult to afford long training time and inference time of big models even in high performance servers, as well. As an efficient approach to compress a large deep model (a teacher model) to a compact model (a student model), knowledge distillation emerges as a promising approach to deal with the big models. Existing knowledge distillation methods cannot exploit the elastic available computing resources and correspond to low efficiency. In this paper, we propose an Elastic Deep Learning framework for knowledge Distillation, i.e., EDL-Dist. The advantages of EDL-Dist are three-fold. First, the inference and the training process is separated. Second, elastic available computing resources can be utilized to improve the efficiency. Third, fault-tolerance of the training and inference processes is supported. We take extensive experimentation to show that the throughput of EDL-Dist is up to 3.125 times faster than the baseline method (online knowledge distillation) while the accuracy is similar or higher.

artificial intelligence, knowledge distillation, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2207.06667

Genre: Research Report (0.84)

Industry:

Information Technology > Security & Privacy (0.46)
Education > Educational Technology > Educational Software (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement

Li, Xingjian, Hu, Di, Li, Xuhong, Xiong, Haoyi, Ye, Zhi, Wang, Zhipeng, Xu, Chengzhong, Dou, Dejing

arXiv.org Artificial IntelligenceOct-16-2020

Fine-tuning deep neural networks pre-trained on large scale datasets is one of the most practical transfer learning paradigm given limited quantity of training samples. To obtain better generalization, using the starting point as the reference, either through weights or features, has been successfully applied to transfer learning as a regularizer. However, due to the domain discrepancy between the source and target tasks, there exists obvious risk of negative transfer. In this paper, we propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED), where the relevant knowledge with respect to the target task is disentangled from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average. TRED also outperforms other state-of-the-art transfer learning regularizers such as L2-SP, AT, DELTA and BSS.

deep learning, neural network, representation, (18 more...)

arXiv.org Artificial Intelligence

2010.08532

Country:

North America > United States > Colorado (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Measuring Information Transfer in Neural Networks

Zhang, Xiao, Li, Xingjian, Dou, Dejing, Wu, Ji

arXiv.org Machine LearningSep-16-2020

Estimation of the information content in a neural network model can be prohibitive, because of difficulty in finding an optimal codelength of the model. We propose to use a surrogate measure to bypass directly estimating model information. The proposed Information Transfer ($L_{IT}$) is a measure of model information based on prequential coding. $L_{IT}$ is theoretically connected to model information, and is consistently correlated with model information in experiments. We show that $L_{IT}$ can be used as a measure of generalizable knowledge in a model or a dataset. Therefore, $L_{IT}$ can serve as an analytical tool in deep learning. We apply $L_{IT}$ to compare and dissect information in datasets, evaluate representation models in transfer learning, and analyze catastrophic forgetting and continual learning algorithms. $L_{IT}$ provides an informational perspective which helps us discover new insights into neural network learning.

deep learning, information, neural network, (18 more...)

arXiv.org Machine Learning

2009.07624

Country: Europe (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

XMixup: Efficient Transfer Learning with Auxiliary Samples by Cross-domain Mixup

Li, Xingjian, Xiong, Haoyi, An, Haozhe, Xu, Chengzhong, Dou, Dejing

arXiv.org Machine LearningJul-20-2020

Transferring knowledge from large source datasets is an effective way to fine-tune the deep neural networks of the target task with a small sample size. A great number of algorithms have been proposed to facilitate deep transfer learning, and these techniques could be generally categorized into two groups - Regularized Learning of the target task using models that have been pre-trained from source datasets, and Multitask Learning with both source and target datasets to train a shared backbone neural network. In this work, we aim to improve the multitask paradigm for deep transfer learning via Cross-domain Mixup (XMixup). While the existing multitask learning algorithms need to run backpropagation over both the source and target datasets and usually consume a higher gradient complexity, XMixup transfers the knowledge from source to target tasks more efficiently: for every class of the target task, XMixup selects the auxiliary samples from the source dataset and augments training samples via the simple mixup strategy. We evaluate XMixup over six real world transfer learning datasets. Experiment results show that XMixup improves the accuracy by 1.9% on average. Compared with other state-of-the-art transfer learning approaches, XMixup costs much less training time while still obtains higher accuracy.

dataset, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

2007.10252

Country: North America > United States > Colorado (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

RIFLE: Backpropagation in Depth for Deep Transfer Learning through Re-Initializing the Fully-connected LayEr

Li, Xingjian, Xiong, Haoyi, An, Haozhe, Xu, Chengzhong, Dou, Dejing

arXiv.org Machine LearningJul-7-2020

Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task. While the accuracy could be largely improved even when the training dataset is small, the transfer learning outcome is usually constrained by the pre-trained model with close CNN weights (Liu et al., 2019), as the backpropagation here brings smaller updates to deeper CNN layers. In this work, we propose RIFLE - a simple yet effective strategy that deepens backpropagation in transfer learning settings, through periodically Re-Initializing the Fully-connected LayEr with random scratch during the fine-tuning procedure. RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning, while the effects of randomization can be easily converged throughout the overall learning procedure. The experiments show that the use of RIFLE significantly improves deep transfer learning accuracy on a wide range of datasets, out-performing known tricks for the similar purpose, such as Dropout, DropConnect, StochasticDepth, Disturb Label and Cyclic Learning Rate, under the same settings with 0.5% -2% higher testing accuracy. Empirical cases and ablation studies further indicate RIFLE brings meaningful updates to deep CNN layers with accuracy improved.

deep learning, neural network, transfer learning, (19 more...)

arXiv.org Machine Learning

2007.03349

Country:

Asia (0.47)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback