A Proofs of Linear Case Throughout the appendix, for ease of notation, we overload the definition of the function d
–Neural Information Processing Systems
The proof of this lemma requires Lemma A.1, which characterizes the distribution of the residual By Pinsker's inequality, this implies d By Lemma A.1, we have E[ X ( null w w The proof is inspired by Theorem 11.2 in [20], with modifications to our setting. First, we construct a "ghost" dataset The most challenging aspect of the ReLU setting is that we do not have an expression for the TV suffered by the MLE, such as Lemma 4.2 in the linear case. The proof of this Lemma, as well as other Lemmas in this section, can be found in Appendix B.1. Using Lemma B.2 and Lemma B.3, we can form a uniform bound, such that all A straight forward combination of Lemma 4.3 and Lemma B.4 gives the following Theorem. Now we can apply Bernstein's inequality (Theorem 2.10 of [8]).
Neural Information Processing Systems
Oct-9-2025, 00:25:42 GMT
- Technology: