Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration