SM.1 Omittedproofs SM.1.1 ProofofProposition1 Proposition1. ThefunctionmC() = 2C(Mϵ()): X [1,c]satisfiesallpropertiesofapredictive multiplicitymetricinDefinition1
–Neural Information Processing Systems
For clarity, we assume|Mϵ(xi)| = m. By the information inequality [1, Theorem 2.6.3] the mutual informationI(M;Y) between the random variablesM and Y (defined in Section 3) is non-negative, i.e.,I(M;Y) 0. On the other hand, we denote the c models in R(H,ϵ) which output scores are the "vertices" of c to be m1,,mc, then H(Y|M = mk) = 0, k [c]. H(Y|M) is minimized to 0 by setting the weightspm on those c models to be 1c and the rest to be0. Since this holds for the capacity-achievingPM, which in turn is the maximimum across input distributions,theconverseresultfollows. Theconsequence ofpredictivemultiplicity isthatthe sameindividual can betreated differently due toarbitrary and unjustified choices made during the training process (e.g., parameter initialization, random seed, dropoutprobability,etc.).
Neural Information Processing Systems
Feb-11-2026, 14:47:45 GMT