From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification