Goto

Collaborating Authors

 multiplicity



Why are there many equally good models? An Anatomy of the Rashomon Effect

Parikh, Harsh

arXiv.org Machine Learning

The Rashomon effect -- the existence of multiple, distinct models that achieve nearly equivalent predictive performance -- has emerged as a fundamental phenomenon in modern machine learning and statistics. In this paper, we explore the causes underlying the Rashomon effect, organizing them into three categories: statistical sources arising from finite samples and noise in the data-generating process; structural sources arising from non-convexity of optimization objectives and unobserved variables that create fundamental non-identifiability; and procedural sources arising from limitations of optimization algorithms and deliberate restrictions to suboptimal model classes. We synthesize insights from machine learning, statistics, and optimization literature to provide a unified framework for understanding why the multiplicity of good models arises. A key distinction emerges: statistical multiplicity diminishes with more data, structural multiplicity persists asymptotically and cannot be resolved without different data or additional assumptions, and procedural multiplicity reflects choices made by practitioners. Beyond characterizing causes, we discuss both the challenges and opportunities presented by the Rashomon effect, including implications for inference, interpretability, fairness, and decision-making under uncertainty.


Towards Understanding the Condensation of Neural Networks at Initial Training

Neural Information Processing Systems

Empirical works show that for ReLU neural networks (NNs) with small initialization, input weights of hidden neurons (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) condense onto isolated orientations. The condensation dynamics implies that the training implicitly regularizes a NN towards one with much smaller effective size. In this work, we illustrate the formation of the condensation in multi-layer fully connected NNs and show that the maximal number of condensed orientations in the initial training stage is twice the multiplicity of the activation function, where ``multiplicity'' indicates the multiple roots of activation function at origin. Our theoretical analysis confirms experiments for two cases, one is for the activation function of multiplicity one with arbitrary dimension input, which contains many common activation functions, and the other is for the layer with one-dimensional input and arbitrary multiplicity. This work makes a step towards understanding how small initialization leads NNs to condensation at the initial training stage.



Contents of the Appendix

Neural Information Processing Systems

The structure of this section is as follows: Appendix A.1 describes the notations used in the proof; Appendix A.2 introduces the properties of mixing matrix We use upper case, bold letters for matrices and lower case, bold letters for vectors. The algebraic multiplicity of eigenvalue 1 of W is 1. Thus the algebraic multiplicity of 1 is 1.Theorem II (Perron-Frobenius Theorem for W). The mixing W of RelaySGD satisfies 1. (Positivity) ρ (W) = 1 is an eigenvalue of W . 2. (Simplicity) The algebraic multiplicity of 1 is 1. 3. (Dominance) ρ( W) = | λ Statements 1 and 4 follow from Lemma 4. Statement 2 follows from Lemma 6. Statement 3 follows from Lemma 5 and Lemma 6.Lemma 7 (Gelfand's formula) . We characterize the convergence rate of the consensus distance in the following key lemma: Lemma' 1 Then, we apply Gelfand's formula (Lemma 7) with Lemma 8. Given I in Definition G, we have the following estimate null1 π This assumption is used in the proof of Proposition III. The complete proofs for each case are then given in the following Appendix A.4, The next lemma explains their relations.




Appendix: A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval Hao Li

Neural Information Processing Systems

In this supplementary material, we discuss the following topics: Firstly, we discuss why we adopt Eq. 1 as the formulation of ASP in Appendix. A. Then, we analyze the differences between two Furthermore, the effect of different semantic metrics on DAA is explored in Appendix. A How ASP Formulation is Designed? The formulation of ASP in the paper is as Eq. 1. The vector above each image is the class label of the image.