An Investigation of Batch Normalization in Off-Policy Actor-Critic Algorithms