Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change