Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

Open in new window