Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals

Open in new window