Goto

Collaborating Authors

 Performance Analysis


A Proof of Theorem 3.1

Neural Information Processing Systems

As proved by Feng et al. (2021), the binary cross-entropy loss We include more results on teacher model and teacher model + {DRO (Hashimoto et al., 2018) /ARL (Lahoti et al., 2020) / FairRF (Zhao et al., 2022) /our knowledge distillation} in Tab. Effect of our label smoothing can be observed by comparing between "Teacher (with hard label)" and "Teacher (with softmax/linear label)" in the 6 tables. Here the capacity is the same, the only difference is the label smoothing. Here the training method is the same, only difference is capacity. Table 8: Results on COMP AS dataset with sensitive attribute race .



Cross-validation Confidence Intervals for Test Error Pierre Bayle

Neural Information Processing Systems

This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for k -fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller k -fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our real-data experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.


A Brain regions

Neural Information Processing Systems

A system of regions (also referred to as a network) can comprise multiple disjoint regions that exhibit shared activity patterns across a range of tasks. The auditory system is located in the superior temporal region of the brain. The voxels were then filtered using gray-matter masking and (for MD and the Language systems) network localization. See Fedorenko et al. [2010] for a discussion of the functional localization approach as it pertains to the language network. For each brain system and each code property or code model, we run a separate MVP A analysis.



Appendices A Proofs A.1 Proof of Proposition

Neural Information Processing Systems

Here we proved that (1) and (2) are equivalent; (1) and (3) are equivalent. Proposition 3. 14 Lemma 2. Given With the lemma above, we now present the proof of Proposition 3. B.1 Example Implementation We provide an example implementation of Algorithm 2 in Listing 1. 17 1 Based on exponentiation by squaring. Best results are in bold. Based on the observation, Wei et al. Our method identified a different source of gradient vanishing caused by the small coefficients for higher-order terms in DAG constraints.





Watermarking Makes Language Models Radioactive Tom Sander

Neural Information Processing Systems

Current methods like membership inference or active IP protection either work only in settings where the suspected text is known or do not provide reliable statistical guarantees. We discover that, on the contrary, it is possible to reliably determine if a language model was trained on synthetic data if that data is output by a watermarked LLM.