Goto

Collaborating Authors

 responsetoreviewer


c20bb2d9a50d5ac1f713f8b34d9aac5a-AuthorFeedback.pdf

Neural Information Processing Systems

Both initialization methods ondownstream tasks canachievesimilar performance, butinitializing from BERT-base33 reduces the number of learning steps. In order to shorten the training time of our large-size model, we initialize34 it from BERT-large. We will also release a model trained from scratch. We further trained BERT-large using the35 same hyper-parameters, buttheresulted model didn'tsignificantly improvedownstream tasks compared tooriginal36 BERT-large.




2f3926f0a9613f3c3cc21d52a3cdb4d9-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all the reviewers for their valuable and positive feedback. Thus we believe it is significant to study in this15 field, which isalsosupported bytheother reviewers. After double-checking, we do have the appendix attached in the supplementary material (in25 "N_Gram_Graph_Paper.pdf").




13e36f06c66134ad65f532e90d898545-AuthorFeedback.pdf

Neural Information Processing Systems

Numerosity cognition may well be a holistic mechanism. Like this reviewer we had the same urge to challenge the13 methodology and results of numerosity studies in neuroscience [21][22], but refrained from debunking directly the14 workspublishedintopjournals.