[D] seq2seq why use cross entropy loss? • r/MachineLearning
If we use word embedding in our seq2seq model, why don't we just use the distance between 2 vectors as a loss function instead of softmax cross entropy?
Dec-30-2017, 09:25:13 GMT
- Technology: