Small batch deep reinforcement learning

Neural Information Processing Systems 

Contrastingly, others have observed that larger batch sizes tend to converge to "sharper" optimization