Introduction to Attention Mechanism
Let us go through one whole step to explain what is happening. At t 1 we're going to use decoder state s_t 1 to computer alignment scores. To compute the alignment score for every encoder state we're using a function that is called alignment function but it's just an MLP (MultiLayer Perceptron). Each alignment score can be treated as "how much h1 is useful in predicting the output in the state s0". The alignment function outputs a scalar value which is a real number and we cannot use it just like that, we have to normalize those values using the softmax function. Output from the softmax function is normalized so all the numbers sum up to 1.
May-15-2021, 17:41:42 GMT
- Technology: