Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation