Breaking MLPerf Training: A Case Study on Optimizing BERT