Alternatives to the Scaled Dot Product for Attention in the Transformer Neural Network Architecture