Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training

Open in new window