Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes

Open in new window