Goto

Collaborating Authors

 Markov Models



86b3e165b8154656a71ffe8a327ded7d-Supplemental.pdf

Neural Information Processing Systems

Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different.







Appendix A Proofs

Neural Information Processing Systems

This result is well-known [29] but we include a proof here for completeness. First, let's define the conditional risk of h at x, denoted by R Next, suppose q ( x) null= c and h( x) null= h( x). From the proof of Lemma 4.1, we know that The proof of Lemma 4.5 depends on another lemma, which will also be useful in the unknown hypothesis class setting. This is the first claim of the lemma. Thus, ( null) must be false!