AITopics | Wen, Zixin

Collaborating Authors

Wen, Zixin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment

Yang, Tong, Mei, Jincheng, Dai, Hanjun, Wen, Zixin, Cen, Shicong, Schuurmans, Dale, Chi, Yuejie, Dai, Bo

arXiv.org Machine LearningOct-28-2024

Fine-tuning large language models (LLMs) to align with human preferences has become a critical challenge in artificial intelligence to ensure the safety of their deployment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a dominant approach, significantly improving LLM performance as demonstrated by InstructGPT [Ouyang et al., 2022] and subsequent works. RLHF combines reward modeling to quantify human preferences and RL fine-tuning to adjust the LLM's output distribution, enhancing desired responses while suppressing unfavorable ones. While RLHF has shown promising results, it comes with significant extra post-training cost, and the aligned LLM may exhibit performance degeneration due to the alignment tax [Askell et al., 2021, OpenAI, 2023]. Alternatively, best-of-N (BoN) sampling has emerged as a simple and surprisingly effective technique to obtain high-quality outputs from an LLM [Stiennon et al., 2020]. In BoN sampling, multiple samples are drawn from an LLM, ranked according to a specific attribute, and the best one is selected.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Machine Learning

2410.20727

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Transformers Provably Learn Feature-Position Correlations in Masked Image Modeling

Huang, Yu, Wen, Zixin, Chi, Yuejie, Liang, Yingbin

arXiv.org Machine LearningMar-4-2024

Masked image modeling (MIM), which predicts randomly masked patches from unmasked ones, has emerged as a promising approach in self-supervised vision pretraining. However, the theoretical understanding of MIM is rather limited, especially with the foundational architecture of transformers. In this paper, to the best of our knowledge, we provide the first end-to-end theory of learning one-layer transformers with softmax attention in MIM self-supervised pretraining. On the conceptual side, we posit a theoretical mechanism of how transformers, pretrained with MIM, produce empirically observed local and diverse attention patterns on data distributions with spatial structures that highlight feature-position correlations. On the technical side, our end-to-end analysis of the training dynamics of softmax-based transformers accommodates both input and position embeddings simultaneously, which is developed based on a novel approach to track the interplay between the attention of feature-position and position-wise correlations.

artificial intelligence, machine learning, natural language, (12 more...)

arXiv.org Machine Learning

2403.02233

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report > Promising Solution (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

Nai, Ruiqian, Wen, Zixin, Li, Ji, Li, Yuanzhi, Gao, Yang

arXiv.org Artificial IntelligenceMar-1-2024

In representation learning, a disentangled representation is highly desirable as it encodes generative factors of data in a separable and compact pattern. Researchers have advocated leveraging disentangled representations to complete downstream tasks with encouraging empirical evidence. This paper further investigates the necessity of disentangled representation in downstream applications. Specifically, we show that dimension-wise disentangled representations are unnecessary on a fundamental downstream task, abstract visual reasoning. We provide extensive empirical evidence against the necessity of disentanglement, covering multiple datasets, representation learning methods, and downstream network architectures. Furthermore, our findings suggest that the informativeness of representations is a better indicator of downstream performance than disentanglement. Finally, the positive correlation between informativeness and disentanglement explains the claimed usefulness of disentangled representations in previous works. The source code is available at https://github.com/Richard-coder-Nai/disentanglement-lib-necessity.git.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2403.00352

Country:

Asia > China (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

What Matters In The Structured Pruning of Generative Language Models?

Santacroce, Michael, Wen, Zixin, Shen, Yelong, Li, Yuanzhi

arXiv.org Artificial IntelligenceFeb-7-2023

Auto-regressive large language models such as GPT-3 require enormous computational resources to use. Traditionally, structured pruning methods are employed to reduce resource usage. However, their application to and efficacy for generative language models is heavily under-explored. In this paper we conduct an comprehensive evaluation of common structured pruning methods, including magnitude, random, and movement pruning on the feed-forward layers in GPT-type models. Unexpectedly, random pruning results in performance that is comparable to the best established methods, across multiple natural language generation tasks. To understand these results, we provide a framework for measuring neuron-level redundancy of models pruned by different methods, and discover that established structured pruning methods do not take into account the distinctiveness of neurons, leaving behind excess redundancies. In view of this, we introduce Globally Unique Movement (GUM) to improve the uniqueness of neurons in pruned models. We then discuss the effects of our techniques on different redundancy metrics to explain the improved performance.

machine learning, natural language, pruning, (21 more...)

arXiv.org Artificial Intelligence

2302.03773

Country: North America > United States (0.28)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning

Wen, Zixin, Li, Yuanzhi

arXiv.org Artificial IntelligenceJan-15-2023

Recently the surprising discovery of the Bootstrap Your Own Latent (BYOL) method by Grill et al. shows the negative term in contrastive loss can be removed if we add the so-called prediction head to the network. This initiated the research of non-contrastive self-supervised learning. It is mysterious why even when there exist trivial collapsed global optimal solutions, neural networks trained by (stochastic) gradient descent can still learn competitive representations. This phenomenon is a typical example of implicit bias in deep learning and remains little understood. In this work, we present our empirical and theoretical discoveries on non-contrastive self-supervised learning. Empirically, we find that when the prediction head is initialized as an identity matrix with only its off-diagonal entries being trainable, the network can learn competitive representations even though the trivial optima still exist in the training objective. Theoretically, we present a framework to understand the behavior of the trainable, but identity-initialized prediction head. Under a simple setting, we characterized the substitution effect and acceleration effect of the prediction head. The substitution effect happens when learning the stronger features in some neurons can substitute for learning these features in other neurons through updating the prediction head. And the acceleration effect happens when the substituted features can accelerate the learning of other weaker features to prevent them from being ignored. These two effects enable the neural networks to learn all the features rather than focus only on learning the stronger features, which is likely the cause of the dimensional collapse phenomenon. To the best of our knowledge, this is also the first end-to-end optimization guarantee for non-contrastive methods using nonlinear neural networks with a trainable prediction head and normalization.

artificial intelligence, machine learning, non-contrastive self-supervised learning, (2 more...)

arXiv.org Artificial Intelligence

2205.06226

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

Add feedback

Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning

Wen, Zixin, Li, Yuanzhi

arXiv.org Machine LearningJun-12-2021

How can neural networks trained by contrastive learning extract features from the unlabeled data? Why does contrastive learning usually need much stronger data augmentations than supervised learning to ensure good representations? These questions involve both the optimization and statistical aspects of deep learning, but can hardly be answered by analyzing supervised learning, where the target functions are the highest pursuit. Indeed, in self-supervised learning, it is inevitable to relate to the optimization/generalization of neural networks to how they can encode the latent structures in the data, which we refer to as the feature learning process. In this work, we formally study how contrastive learning learns the feature representations for neural networks by analyzing its feature learning process. We consider the case where our data are comprised of two types of features: the more semantically aligned sparse features which we want to learn from, and the other dense features we want to avoid. Theoretically, we prove that contrastive learning using $\mathbf{ReLU}$ networks provably learns the desired sparse features if proper augmentations are adopted. We present an underlying principle called $\textbf{feature decoupling}$ to explain the effects of augmentations, where we theoretically characterize how augmentations can reduce the correlations of dense features between positive samples while keeping the correlations of sparse features intact, thereby forcing the neural networks to learn from the self-supervision of sparse features. Empirically, we verified that the feature decoupling principle matches the underlying mechanism of contrastive learning in practice.

augmentation, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

2105.15134

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine > Therapeutic Area (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback