AITopics | distribution characteristic

Collaborating Authors

distribution characteristic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring and Reshaping the Weight Distribution in LLM

Ye, Chunming, Li, Songzhou, Xu, Xu

arXiv.org Artificial IntelligenceSep-3-2025

The performance of Large Language Models is influenced by their characteristics such as architecture, model sizes, decoding methods and so on. Due to differences in structure or function, the weights in different layers of large models have varying distributions. This paper explores the correlations between different types of layers in terms of weights distribution and studies the potential impact of these correlations on LoRA training effectiveness. Firstly, the study reveals that in the model the cosine distances between weights of different layers manifest power-law distribution. We extract Query-projection, down-projection and other weight matrices from the self-attention layers and MLP layers, calculate the singular values of the matrices using singular value decomposition, and organize a certain number of singular values into matrices according to projection's type. By analyzing the probability distribution of the cosine distances between these matrices, it is found that the cosine distances values between them have distinct power-law distribution characteristics. Secondly, based on the results of distance calculations and analysis across different layers of model, a qualitative method is proposed to describe the distribution characteristics of different models. Next, to construct weights that align with the distribution characteristics, a data generator is designed using a combination of Gaussian process and Pareto distribution functions. The generator is used to simulate the generation of data that aligns with specific distribution characteristics. Finally, based on the aforementioned distribution characteristics and data generation method, the weights in LoRA initialization are reshaped for training. Experimental results indicate that, without altering the model structure or training process, this method achieves a certain improvement in the performance of LoRA training.

distribution characteristic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.00046

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

Paruchuri, Akshay, Garrison, Jake, Liao, Shun, Hernandez, John, Sunshine, Jacob, Althoff, Tim, Liu, Xin, McDuff, Daniel

arXiv.org Artificial IntelligenceJun-18-2024

Language models (LM) are capable of remarkably complex linguistic tasks; however, numerical reasoning is an area in which they frequently struggle. An important but rarely evaluated form of reasoning is understanding probability distributions. In this paper, we focus on evaluating the probabilistic reasoning capabilities of LMs using idealized and real-world statistical distributions. We perform a systematic evaluation of state-of-the-art LMs on three tasks: estimating percentiles, drawing samples, and calculating probabilities. We evaluate three ways to provide context to LMs 1) anchoring examples from within a distribution or family of distributions, 2) real-world context, 3) summary statistics on which to base a Normal approximation. Models can make inferences about distributions, and can be further aided by the incorporation of real-world context, example shots and simplified assumptions, even if these assumptions are incorrect or misspecified. To conduct this work, we developed a comprehensive benchmark distribution dataset with associated question-answer pairs that we will release publicly.

percentile, probability, reasoning, (12 more...)

arXiv.org Artificial Intelligence

2406.1283

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
Europe > Albania > Durrës County > Durrës (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Distribution-restrained Softmax Loss for the Model Robustness

Wang, Hao, Li, Chen, Jiang, Jinzhe, Zhang, Xin, Zhao, Yaqian, Gong, Weifeng

arXiv.org Artificial IntelligenceMar-22-2023

Recently, the robustness of deep learning models has received widespread attention, and various methods for improving model robustness have been proposed, including adversarial training, model architecture modification, design of loss functions, certified defenses, and so on. However, the principle of the robustness to attacks is still not fully understood, also the related research is still not sufficient. Here, we have identified a significant factor that affects the robustness of models: the distribution characteristics of softmax values for non-real label samples. We found that the results after an attack are highly correlated with the distribution characteristics, and thus we proposed a loss function to suppress the distribution diversity of softmax. A large number of experiments have shown that our method can improve robustness without significant time consumption.

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2303.12363

Country:

Asia > China > Henan Province > Zhengzhou (0.05)
Asia > China > Shandong Province > Jinan (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Improved Crowding Distance for NSGA-II

Chu, Xiangxiang, Yu, Xinjie

arXiv.org Artificial IntelligenceNov-30-2018

Non-dominated sorting genetic algorithm II (NSGA-II) does well in dealing with multi-objective problems. When evaluating validity of an algorithm for multi-objective problems, two kinds of indices are often considered simultaneously, i.e. the convergence to Pareto Front and the distribution characteristic. The crowding distance in the standard NSGA-II has the property that solutions within a cubic have the same crowding distance, which has no contribution to the convergence of the algorithm. Actually the closer to the Pareto Front a solution is, the higher priority it should have. In the paper, the crowding distance is redefined while keeping almost all the advantages of the original one. Moreover, the speed of converging to the Pareto Front is faster. Finally, the improvement is proved to be effective by applying it to solve nine Benchmark problems.

evolutionary algorithm, machine learning, nsga-ii, (14 more...)

arXiv.org Artificial Intelligence

1811.12667

Country:

Asia > China > Beijing > Beijing (0.05)
Europe > Switzerland (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback