Batch size-invariance for policy optimization