1c10d0c087c14689628124bbc8fa69f6-Supplemental-Conference.pdf

Apr-25-2026, 13:46:24 GMT–Neural Information Processing Systems

A.1 For LEHD model467 In Table 5, we explore the effects of eliminating normalization from the attention layer in our LEHD468 model. We train three LEHD models with the same training scheme and training budget, differing469 solely in the attention layer: one with batch normalization (BN), one with instance normalization470 (IN), and one without normalization (w/o). We train all three POMO models with the same reinforcement learning method477 with POMO strategy and training budget (1000 epochs). The results show that different types of478 normalization have few effects on the POMO model.479 The results in Table 6 show that removing normalization from attention layer has little impact on the480 model with a heavy encoder and a light decoder.

artificial intelligence, machine learning, node, (19 more...)

Neural Information Processing Systems

Apr-25-2026, 13:46:24 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > New Finding (0.35)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.35)

Duplicate Docs Excel Report

Title
A Ablation study of normalization 466 A.1 For LEHD model

Similar Docs Excel Report more

Title	Similarity	Source
None found