Q1: Both reviewer # 4 and reviewer # 5 think it is essential to compare the proposed method with Pre-LayerNorm

Aug-15-2025, 12:38:21 GMT–Neural Information Processing Systems

Q1: Both reviewer #4 and reviewer #5 think it is essential to compare the proposed method with Pre-LayerNorm. We added additional experiments to investigate the question on how PLD compares with PreLN? GLUE score (80.2) compared with Post-LN (82.1) on downstream tasks. When trained with the large learning rate as PLD, PreLN's Q2: Reviewer #3, #4, #5 ask about a comparison to simpler and alternative schedules. The current schedule is actually simple.

hyperparameter, pre-layernorm, reviewer, (16 more...)

Neural Information Processing Systems

Aug-15-2025, 12:38:21 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.71)

Duplicate Docs Excel Report

Title
Q1: Both reviewer # 4 and reviewer # 5 think it is essential to compare the proposed method with Pre-LayerNorm

Similar Docs Excel Report more

Title	Similarity	Source
None found