Beyond Self-learned Attention: Mitigating Attention Bias in Transformer-based Models Using Attention Guidance