recursively defined as z
–Neural Information Processing Systems
We are grateful for all the reviewers' valuable suggestions and questions. The results are displayed in Figure 1. We can see that mZAS initialization always outperforms the Xavier initialization. ICLR2019), but with the top layer to be zero. We will clarify this in the revised version.
Neural Information Processing Systems
May-31-2025, 07:06:20 GMT
- Technology: