ffe10334251de1dc98339d99ae4743ba-AuthorFeedback.pdf

Neural Information Processing Systems 

We thank the reviewers for their thoughtful comments. But consider the case of training BERT on a TPU pod, which takes around 4 days. We provide a formalization of the problem with rigorous guarantees. We now address a few of the specific reviewer concerns. However, in the revised version of this paper we will include a more thorough discussion of this. That post draws on Courcelle's theorem (namely, every graph property definable in the monadic second-order We feel that it's more accurate to avoid

Similar Docs  Excel Report  more

TitleSimilaritySource
None found