AMissing Proofs Theorem 1. The excessive loss of a group a Ais upper bounded by3: R(a) gℓa θ θ + 1 2 λ Hℓa θ θ

May-1-2026, 02:47:32 GMT–Neural Information Processing Systems

J( θ; Da) is the Hessian matrix of the loss function ℓ, at the optimal parameters vector θ, computed using the group data Da (henceforth simply referred to as group hessian), and λ(Σ) is the maximum eigenvalue of a matrix Σ. Proof. Using a second order Taylor expansion around θ, the excessive loss R(a) for a group a A can be stated as: R(a) = J( θ; Da) J( θ; Da) = " J θ; Da + θ θ Hℓa θ θ +O θ θ 3 The above, follows from the loss ℓ() being at least twice differentiable, by assumption. Consider two groups a and b in Awith |Da| |Db|. Proposition 2. For a given group a A, gradient norms can be upper bounded as: gℓa O X The above proposition is presented in the context of cross entropy loss or mean squared error loss functions. These two cases are reviewed as follows 3With a slight abuse of notation, the results refer to θ as the homonymous vector which is extended with k k zeros.

artificial intelligence, gradient norm, machine learning, (16 more...)

Neural Information Processing Systems

May-1-2026, 02:47:32 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.71)

Duplicate Docs Excel Report

Title
7087c949df293f13c0052ac825936e6f-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found