Unveiling the Mystery of Weight in Large Foundation Models: Gaussian Distribution Never Fades

Si, Chongjie, Jiang, Jingjing, Shen, Wei

Jan-18-2025–arXiv.org Artificial Intelligence

This paper presents a pioneering exploration of the mechanisms underlying large foundation models' (LFMs) weights, aiming to simplify AI research. Through extensive observation and analysis on prevailing LFMs, we find that regardless of initialization strategies, their weights predominantly follow a Gaussian distribution, with occasional sharp, inverted T-shaped, or linear patterns. We further discover that the weights share the i.i.d. properties of Gaussian noise, and explore their direct relationship. We find that transformation weights can be derived from Gaussian noise, and they primarily serve to increase the standard deviation of pre-trained weights, with their standard deviation growing with layer depth. In other words, transformation weights broaden the acceptable deviation from the optimal weights, facilitating adaptation to downstream tasks. Building upon the above conclusions, we thoroughly discussed the nature of optimal weights, ultimately concluding that they should exhibit zero-mean, symmetry, and sparsity, with the sparse values being a truncated Gaussian distribution and a few outliers. Our experiments in LFM adaptation and editing demonstrate the effectiveness of these insights. We hope these findings can provide a foundational understanding to pave the way for future advancements in the LFM community.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Jan-18-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.27)
- North America > United States
  - Minnesota (0.27)

Genre:
- Research Report > New Finding (0.92)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Modeling & Simulation (0.90)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Cognitive Science (0.92)
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (0.67)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found