Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?