3acb2a202ae4bea8840224e6fce16fd0-AuthorFeedback.pdf
–Neural Information Processing Systems
We thank the reviewers for their insightful and useful feedback! 's primary concern is the gap between the performance of our BERT + prism model and SOT A Other points: R1: "The authors do not fully describe the various hypotheses they imply..." R2: Prism layer transforms are fixed; how does this compare to a learned transform? MLM task may lead to all the frequency bands becoming local. R2: "In fig.5, why is BERT+prism worse for indices outside [200, 300]?" R3: "precise choice on where to use the prism layer raises some questions..." R3: "The way of dividing the embeddings into 5 sectors seems a bit naive" We will note this in the paper, and that there is opportunity for future work! R4: "It would be nice to see ablations where you use high filters on POS tagging and low filters on para-31 R4: "As a sanity check, you could try to see what happens if you don't finetune the initial BERT model on The original BERT model achieves an accuracy of 94.6% for POS tagging, 41.8 for dialog acts, 28.9 for topic classification, slightly worse than our model that was trained longer on R4: "Since Figure 5 demonstrates good performance on long range masked language modeling, LAMBADA
Neural Information Processing Systems
Oct-2-2025, 17:18:20 GMT
- Technology: