Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation