[R] What is the current state of the art architectures for RNNs? • r/MachineLearning
I think it is more ambiguous what makes a SotA RNN architecture, and it is very task specific. For NLP, I think the general strategy is to replace each token with a pre-trained word embedding (GloVe or Word2Vec), and then to "encode" the sentence using something like a bidirectional LSTM/GRU (I will call this the RNN encoder). For sequence tagging tasks (such as part of speech tagging or named entity recognition), you take each of hidden state of the RNN encoder and classify it with something like a ReLU network. As there is some "structural dependencies" for these type of tasks, it usually can boost performance to use something like a CRF on top of the RNN encoder. For sentence classification tasks, can simply classify the final state of the RNN encoder.
May-7-2017, 16:05:45 GMT
- Technology: