AITopics | preln

Collaborating Authors

preln

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

a1140a3d0df1c81e24ae954d935e8926-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 15:05:07 GMT

baseline, international conference, preln, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
(8 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Q1: Both reviewer # 4 and reviewer # 5 think it is essential to compare the proposed method with Pre-LayerNorm

Neural Information Processing SystemsFeb-9-2026, 15:04:55 GMT

Q1: Both reviewer #4 and reviewer #5 think it is essential to compare the proposed method with Pre-LayerNorm. We added additional experiments to investigate the question on how PLD compares with PreLN? GLUE score (80.2) compared with Post-LN (82.1) on downstream tasks. When trained with the large learning rate as PLD, PreLN's Q2: Reviewer #3, #4, #5 ask about a comparison to simpler and alternative schedules. The current schedule is actually simple.

artificial intelligence, machine learning, reviewer, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

A training

Neural Information Processing SystemsAug-15-2025, 12:38:40 GMT

Table 4 describes the hyperparameters for pre-training the baseline and PLD. Eqn. 5 indicates that the gradient Figure 1 shows the full comparison of the baseline and PLD, fine-tuned at different checkpoints. Specifically, the fine-tuning results are often much worse with a large learning rate. Figure 11: The fine-tuning results at different checkpoints.Figure 12: Convergence curves varying the keep ratio θ .

different checkpoint, downstream task, representation, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

a1140a3d0df1c81e24ae954d935e8926-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 12:38:33 GMT

baseline, international conference, preln, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
(8 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Q1: Both reviewer # 4 and reviewer # 5 think it is essential to compare the proposed method with Pre-LayerNorm

Neural Information Processing SystemsAug-15-2025, 12:38:21 GMT

hyperparameter, pre-layernorm, reviewer, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

Review for NeurIPS paper: Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

Neural Information Processing SystemsJan-27-2025, 04:06:59 GMT

Summary and Contributions: This paper proposes to accelerate training of Transformer networks by progressively reducing Transformer layers from the network during training. First, it compares two different architectures of BERT, PostLN and PreLN. PostLN applies layer normalization after the element-wise addition in Transformer blocks. The PreLN changes the placement of the location of layer normalization by placing it only on the input stream of the sublayers. It finds that PostLN is more sensitive to the choice of hyperparameters, and training often diverges with more aggressive learning rates whereas PreLN avoids vanishing gradients and leads to more stable optimization.

accelerating training, progressive layer dropping, transformer-based language model, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback