Just One Layer Norm Guarantees Stable Extrapolation

Open in new window