AITopics | Xing, Yue

Collaborating Authors

Xing, Yue

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards the Effect of Examples on In-Context Learning: A Theoretical Case Study

He, Pengfei, Cui, Yingqian, Xu, Han, Liu, Hui, Yamada, Makoto, Tang, Jiliang, Xing, Yue

arXiv.org Machine LearningOct-12-2024

In-context learning (ICL) has emerged as a powerful capability for large language models (LLMs) to adapt to downstream tasks by leveraging a few (demonstration) examples. Despite its effectiveness, the mechanism behind ICL remains underexplored. To better understand how ICL integrates the examples with the knowledge learned by the LLM during pre-training (i.e., pre-training knowledge) and how the examples impact ICL, this paper conducts a theoretical study in binary classification tasks. In particular, we introduce a probabilistic model extending from the Gaussian mixture model to exactly quantify the impact of pre-training knowledge, label frequency, and label noise on the prediction accuracy. Based on our analysis, when the pre-training knowledge contradicts the knowledge in the examples, whether ICL prediction relies more on the pre-training knowledge or the examples depends on the number of examples. In addition, the label frequency and label noise of the examples both affect the accuracy of the ICL prediction, where the minor class has a lower accuracy, and how the label noise impacts the accuracy is determined by the specific noise level of the two classes. Extensive simulations are conducted to verify the correctness of the theoretical results, and real-data experiments also align with the theoretical insights. Our work reveals the role of pre-training knowledge and examples in ICL, offering a deeper understanding of LLMs' behaviors in classification tasks.

in-context learning, large language model, machine learning, (18 more...)

arXiv.org Machine Learning

2410.09411

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adversarial Vulnerability as a Consequence of On-Manifold Inseparibility

Haldar, Rajdeep, Xing, Yue, Song, Qifan, Lin, Guang

arXiv.org Machine LearningOct-9-2024

Recent works have shown theoretically and empirically that redundant data dimensions are a source of adversarial vulnerability. However, the inverse doesn't seem to hold in practice; employing dimension-reduction techniques doesn't exhibit robustness as expected. In this work, we consider classification tasks and characterize the data distribution as a low-dimensional manifold, with high/low variance features defining the on/off manifold direction. We argue that clean training experiences poor convergence in the off-manifold direction caused by the ill-conditioning in widely used first-order optimizers like gradient descent. The poor convergence then acts as a source of adversarial vulnerability when the dataset is inseparable in the on-manifold direction. We provide theoretical results for logistic regression and a 2-layer linear network on the considered data distribution. Furthermore, we advocate using second-order methods that are immune to ill-conditioning and lead to better robustness. We perform experiments and exhibit tremendous robustness improvements in clean training through long training and the employment of second-order methods, corroborating our framework. Additionally, we find the inclusion of batch-norm layers hinders such robustness gains. We attribute this to differing implicit biases between traditional and batch-normalized neural networks.

artificial intelligence, convergence, machine learning, (18 more...)

arXiv.org Machine Learning

2410.06921

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

Haldar, Rajdeep, Xing, Yue, Song, Qifan

arXiv.org Machine LearningMar-6-2024

The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or onmanifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not Figure 1: Mental image: The oracle decision boundary affect generalization performance on samples (green dashed line) determines the label (blue or red) drawn from the observed data space, it makes of any point in the Euclidean space. The observed data the clean-trained model more vulnerable to space consists of 1-dimensional line segments immersed adversarial perturbations in the off-manifold in the 2-dimensional space. The model learns the direction of the data space. Our main results estimated decision boundary (black dotted line) based provide an explicit relationship between the on the observed data.

artificial intelligence, dimension, machine learning, (17 more...)

arXiv.org Machine Learning

2403.03967

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)

Zeng, Shenglai, Zhang, Jiankun, He, Pengfei, Xing, Yue, Liu, Yiding, Xu, Han, Ren, Jie, Wang, Shuaiqiang, Yin, Dawei, Chang, Yi, Tang, Jiliang

arXiv.org Artificial IntelligenceFeb-23-2024

On the other 2023; Shi et al., 2023) is an advanced natural language hand, the retrieval process in RAG could also influence processing technique that enhances text generation the behavior of the LLMs for text-generation, by integrating information retrieved from and this could possibly cause the LLMs to output a large corpus of documents. These techniques private information from its training/fine-tuning enable RAG to produce accurate and contextually dataset. Notably, there are existing works (Carlini relevant outputs with augmented external knowledge et al., 2021; Kandpal et al., 2022; Lee et al., and have been widely used in various scenarios 2021; Carlini et al., 2022; Zeng et al., 2023) observing such as domain-specific chatbots (Siriwardhana that LLMs can remember and leak private et al., 2023) and email/code completion (Parvez information from their pre-training and fine-tuning et al., 2021). RAG systems typically work in two data. However, how the integration of external retrieval phases, as shown in Fig 1 - retrieval and generation.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2402.16893

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

Add feedback

Benefits of Transformer: In-Context Learning in Linear Regression Tasks with Unstructured Data

Xing, Yue, Lin, Xiaofeng, Suh, Namjoon, Song, Qifan, Cheng, Guang

arXiv.org Artificial IntelligenceFeb-1-2024

In practice, it is observed that transformer-based models can learn concepts in context in the inference stage. While existing literature, e.g., \citet{zhang2023trained,huang2023context}, provide theoretical explanations on this in-context learning ability, they assume the input $x_i$ and the output $y_i$ for each sample are embedded in the same token (i.e., structured data). However, in reality, they are presented in two tokens (i.e., unstructured data \cite{wibisono2023role}). In this case, this paper conducts experiments in linear regression tasks to study the benefits of the architecture of transformers and provides some corresponding theoretical intuitions to explain why the transformer can learn from unstructured data. We study the exact components in a transformer that facilitate the in-context learning. In particular, we observe that (1) a transformer with two layers of softmax (self-)attentions with look-ahead attention mask can learn from the prompt if $y_i$ is in the token next to $x_i$ for each example; (2) positional encoding can further improve the performance; and (3) multi-head attention with a high input embedding dimension has a better prediction performance than single-head attention.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.00743

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Superiority of Multi-Head Attention in In-Context Linear Regression

Cui, Yingqian, Ren, Jie, He, Pengfei, Tang, Jiliang, Xing, Yue

arXiv.org Artificial IntelligenceJan-30-2024

We present a theoretical analysis of the performance of transformer with softmax attention in in-context learning with linear regression tasks. While the existing literature predominantly focuses on the convergence of transformers with single-/multi-head attention, our research centers on comparing their performance. We conduct an exact theoretical analysis to demonstrate that multi-head attention with a substantial embedding dimension performs better than single-head attention. When the number of in-context examples D increases, the prediction loss using single- /multi-head attention is in O (1 /D), and the one for multi-head attention has a smaller multiplicative constant. In addition to the simplest data distribution setting, we consider more scenarios, e.g., noisy labels, local examples, correlated features, and prior knowledge. We observe that, in general, multi-head attention is preferred over single-head attention. Our results verify the effectiveness of the design of multi-head attention in the transformer architecture.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2401.17426

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

Add feedback

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

Xing, Yue, Lin, Xiaofeng, Song, Qifan, Xu, Yi, Zeng, Belinda, Cheng, Guang

arXiv.org Artificial IntelligenceJan-26-2024

Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models. Existing literature, e.g., \cite{kim2020adversarial}, empirically observe that the downstream tasks can inherit the adversarial robustness of the pre-trained model. We provide theoretical justifications for this robustness inheritance phenomenon. Our theoretical results reveal that feature purification plays an important role in connecting the adversarial robustness of the pre-trained model and the downstream tasks in two-layer neural networks. Specifically, we show that (i) with adversarial training, each hidden node tends to pick only one (or a few) feature; (ii) without adversarial training, the hidden nodes can be vulnerable to attacks. This observation is valid for both supervised pre-training and contrastive learning. With purified nodes, it turns out that clean training is enough to achieve adversarial robustness in downstream tasks.

adversarial training, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2401.15248

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Exploring Memorization in Fine-tuned Language Models

Zeng, Shenglai, Li, Yaxin, Ren, Jie, Liu, Yiding, Xu, Han, He, Pengfei, Xing, Yue, Wang, Shuaiqiang, Tang, Jiliang, Yin, Dawei

arXiv.org Artificial IntelligenceOct-10-2023

LLMs have shown great capabilities in various tasks but also exhibited memorization of training data, thus raising tremendous privacy and copyright concerns. While prior work has studied memorization during pre-training, the exploration of memorization during fine-tuning is rather limited. Compared with pre-training, fine-tuning typically involves sensitive data and diverse objectives, thus may bring unique memorization behaviors and distinct privacy risks. In this work, we conduct the first comprehensive analysis to explore LMs' memorization during fine-tuning across tasks. Our studies with open-sourced and our own fine-tuned LMs across various tasks indicate that fine-tuned memorization presents a strong disparity among tasks. We provide an understanding of this task disparity via sparse coding theory and unveil a strong correlation between memorization and attention score distribution. By investigating its memorization behavior, multi-task fine-tuning paves a potential strategy to mitigate fine-tuned memorization.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.06714

Country:

North America > United States (1.00)
Africa > Middle East > Libya (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Law > Criminal Law (0.93)
Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

DiffusionShield: A Watermark for Copyright Protection against Generative Diffusion Models

Cui, Yingqian, Ren, Jie, Xu, Han, He, Pengfei, Liu, Hui, Sun, Lichao, Xing, Yue, Tang, Jiliang

arXiv.org Artificial IntelligenceOct-9-2023

Recently, Generative Diffusion Models (GDMs) have showcased their remarkable capabilities in learning and generating images. A large community of GDMs has naturally emerged, further promoting the diversified applications of GDMs in various fields. However, this unrestricted proliferation has raised serious concerns about copyright protection. For example, artists including painters and photographers are becoming increasingly concerned that GDMs could effortlessly replicate their unique creative works without authorization. In response to these challenges, we introduce a novel watermarking scheme, DiffusionShield, tailored for GDMs. DiffusionShield protects images from copyright infringement by GDMs through encoding the ownership information into an imperceptible watermark and injecting it into the images. Its watermark can be easily learned by GDMs and will be reproduced in their generated images. By detecting the watermark from generated images, copyright infringement can be exposed with evidence. Benefiting from the uniformity of the watermarks and the joint optimization method, DiffusionShield ensures low distortion of the original image, high watermark detection performance, and the ability to embed lengthy messages. We conduct rigorous and comprehensive experiments to show the effectiveness of DiffusionShield in defending against infringement by GDMs and its superiority over traditional watermarking methods.

artificial intelligence, machine learning, watermark, (18 more...)

arXiv.org Artificial Intelligence

2306.04642

Genre: Research Report (1.00)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Adversarial Training with Generated Data in High-Dimensional Regression: An Asymptotic Study

Xing, Yue

arXiv.org Artificial IntelligenceJun-21-2023

In recent years, studies such as \cite{carmon2019unlabeled,gowal2021improving,xing2022artificial} have demonstrated that incorporating additional real or generated data with pseudo-labels can enhance adversarial training through a two-stage training approach. In this paper, we perform a theoretical analysis of the asymptotic behavior of this method in high-dimensional linear regression. While a double-descent phenomenon can be observed in ridgeless training, with an appropriate $\mathcal{L}_2$ regularization, the two-stage adversarial training achieves a better performance. Finally, we derive a shortcut cross-validation formula specifically tailored for the two-stage training method.

adversarial training, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2306.12582

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback