Reducing Transformer Depth on Demand with Structured Dropout

Open in new window