Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals